<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>Hedonistic Learning</title>
        <link>https://www.hedonisticlearning.com</link>
        <description><![CDATA[Mostly math, physics, and CS]]></description>
        <atom:link href="https://www.hedonisticlearning.com/rss.xml" rel="self"
                   type="application/rss+xml" />
        <lastBuildDate>Mon, 10 Nov 2025 00:36:38 UT</lastBuildDate>
        <item>
    <title>Umbral Calculus</title>
    <link>https://www.hedonisticlearning.com/posts/umbral-calculus.html</link>
    <description><![CDATA[<h2 id="introduction">Introduction</h2>
<p>I’ll start by rationalizing an example of “old” umbral calculus from <a href="https://en.wikipedia.org/wiki/Umbral_calculus#19th-century_umbral_calculus">Wikipedia</a>.
|\newcommand{pair}[2]{\langle{#1}\mid{#2}\rangle}|
|\newcommand{bigpair}[2]{\left\langle{#1}\ \middle|\ {#2}\right\rangle}|
|\newcommand{pseq}[2]{\{#1\}_{#2 \in \mathbb N}}|
|\newcommand{ucomp}[2]{#1_n(\underline #2(x))}|
<style>details summary { cursor: pointer; } details:open summary .expander { display: none; }</style>
<style>details { border: solid 1px; padding: 1ex; color: var(--list-group-item-color); margin-bottom: 1em; }</style></p>
<p>We know |(x+y)^n = \sum_{k=0}^n {n \choose k} x^{n-k} y^k|. We then “infer”
that |B_n(x+y) = \sum_{k=0}^n {n \choose k} B_{n-k}(x) y^k| where
|B_n(x)| are the <a href="https://en.wikipedia.org/wiki/Bernoulli_polynomials">Bernoulli polynomials</a>.
Actually, the “old” style would be even more dubious. You’d have a “rule” like representing
|B_n(x+y)| as |(b+y)^n|, then expand like usual and replace |b^k = (b + 0)^k| with |B_k(x)|.
The variables like |b| were the “shadow” or “umbral” variables.</p>
<p>Rationalizing it using techniques I’ll describe below.
Let |\varepsilon_y| be the linear operator on polynomials satisfying
|\varepsilon_y p(x) = p(x + y)|. Since |D_x[\varepsilon_y p(x)] = \varepsilon_y D_x p(x)|
for all |y| where |D_x| is differentiation by |x|, |\varepsilon_y| is induced from a formal power
series in |D_x|. In particular, |\varepsilon_y = e^{yD_x}|.</p>
<p>Let |T| be the linear operator (the Sheffer operator) characterized by mapping |x^n| to
|B_n(x)| representing a change of basis on the vector space of polynomials. We can apply |T| to the
equation
\[ \varepsilon_y(x^n) = (x+y)^n = \sum_{k=0}^n {n \choose k} x^{n-k} y^k \]
to get via linearity
\[ T(\varepsilon_y(x^n)) = T((x+y)^n)
= \sum_{k=0}^n {n \choose k} T(x^{n-k}) y^k
= \sum_{k=0}^n {n \choose k} B_{n-k}(x) y^k
\]</p>
<p>The key property we then need is |T(\varepsilon_y(x^n)) = \varepsilon_y T(x^n) = B_n(x + y)|
which can be reduced to |D_xT(x^n) = T(D_x x^n)|. This is just the statement that
|D_x T(x^n) = D_x B_n(x) = nB_{n-1}(x) = nT(x^{n-1}) = T(nx^{n-1}) = T(D_x x^n)| using a
well-known property of Bernoulli (and other) polynomials. In fact, this relation implies that |T| is
itself induced by a formal power series and thus commutes with <em>any</em> other linear operator so
induced.</p>
<p>Ultimately, the only properties we needed for this was that we had a (linear) change of basis from
the monomial basis to the polynomial sequence, which we’ll have for any polynomial
sequence whose |n|th element has degree |n|, and that the change of basis commuted with the |D_x|
operator. The latter is more stringent, but there are various ways we can expand the scope of the
argument.</p>
<p>First, |D_x x^n = \frac{c_n}{c_{n-1}} x^{n-1}| with |c_n = n!| completely characterizes
(with |D_x x^0 = 0|) the differentiation operation on polynomials. Choosing a different sequence
|c| leads to different notions of “differentiation”. This will change |\varepsilon_y| and lead to
different formulas, but they will be structurally similar.</p>
<p>In a different direction, we can ask for other formal power series and polynomial sequences that
relate to each other the way |D_x| and |x^n| do. We say that a polynomial sequence |s_n(x)| is
<strong>Sheffer</strong> for a pair of formal power series |(g(t), f(t))| with |\deg g = 0| and |\deg f = 1|
when |\pair{g(t)f(t)^k}{s_n(x)} = c_n\delta_k^n|. (This inner-product-like notation will be
defined later, but the key thing is that this mirrors |\pair{t^k}{x^n} = c_n\delta_k^n|.)
This has |g(t)f(t)^k| taking the place of |t^k| and |s_n(x)| taking the place of |x^n|. This
would let us transfer identities involving the |s_n(x)| to any linear operator that commutes with
|g(t)f(t)|. While changing |c| changes our notion of “differentiation”, using a Sheffer sequence
allows us to consider other “differential operators” using the same notion of “differentiation”.</p>
<p>This is based primarily off of works by Steven Roman and Gian-Carlo Rota.
It closely follows <a href="https://doi.org/10.1016/0022-247X(82)90154-8">The Theory of the Umbral Calculus. I</a>
by Steven Roman (1982). See also <a href="http://sroman.com/MBOOKS/MathUmbral.htm">The Umbral Calculus</a>
by Steven Roman (1984).</p>
<p>I’ll include proofs below to illustrate that each bit of reasoning is fairly straightforward.</p>
<!-- vim-markdown-toc GFM -->
<ul>
<li><a href="#overview">Overview</a></li>
<li><a href="#conventions">Conventions</a></li>
<li><a href="#formal-power-series">Formal Power Series</a></li>
<li><a href="#linear-functionals">Linear Functionals</a>
<ul>
<li><a href="#some-properties">Some Properties</a></li>
<li><a href="#evaluation-functional">Evaluation Functional</a></li>
<li><a href="#formal-derivative">Formal Derivative</a></li>
</ul></li>
<li><a href="#linear-operators">Linear Operators</a>
<ul>
<li><a href="#evaluation-operator">Evaluation Operator</a></li>
<li><a href="#characterizing-linear-operators-induced-from-formal-power-series">Characterizing Linear Operators Induced from Formal Power Series</a></li>
</ul></li>
<li><a href="#polynomial-sequences">Polynomial Sequences</a></li>
<li><a href="#recurrence-formulas">Recurrence Formulas</a></li>
<li><a href="#transfer-formulas">Transfer Formulas</a></li>
<li><a href="#umbral-composition-and-transfer-operators">Umbral Composition and Transfer Operators</a></li>
<li><a href="#example-chebyshev-polynomials">Example: Chebyshev Polynomials</a></li>
<li><a href="#summary">Summary</a></li>
</ul>
<!-- vim-markdown-toc -->
<h2 id="overview">Overview</h2>
<p>One of the key ideas of umbral calculus is the relation of four vector spaces: the space of
polynomials, its dual space, i.e. linear functionals on the space of polynomials, linear operators,
i.e. endomorphisms, on the space of polynomials, and the space of formal power series. It will turn
out that the space of formal power series and the space of linear functionals on the space of
polynomials are isomorphic not just as vector spaces but as algebras. Further, we can embed
the space of formal power series as a commutative sub-algebra of the space of linear operators on
the space of polynomials. This latter view views formal power series as differential operators on
polynomials. This gives us three perspectives on formal powers series and two ways they interact
with polynomials.</p>
<p>We’ll also be interested in linear operators on the space of polynomials that <em>aren’t</em> induced by
formal power series such as the transfer or umbral operators and umbral shift operators. A transfer
operator is essentially a “well-behaved” change of basis from the monomial basis. Umbral shift
operators generalize the “multiply by |x|” operator. These operators will have adjoints that are
linear operators on the space of formal power series.</p>
<p>We’ll find that a linear operator on the space of formal power series is an adjoint of a linear
operator on the space of polynomials if and only if it is continuous in a sense to be defined.
Surjective <a href="https://en.wikipedia.org/wiki/Derivation_(differential_algebra)">derivations</a>
on the space of formal power series will be exactly those adjoint to umbral shifts. Finally,
continuous algebra automorphisms on the algebra of formal power series will be exactly those
adjoint to transfer operators.</p>
<p>Beyond these structural aspects, we’ll also derive many results for working with Sheffer sequences
and polynomial sequences along the way.</p>
<h2 id="conventions">Conventions</h2>
<p>Fix a field |\mathbb K| of characteristic |0|. I’ll write |\mathscr F = \mathbb K[\![t]\!]|
for the |\mathbb K|-algebra of univariate formal power series (with indeterminate |t|) and
|P = \mathbb K[x]| for the |\mathbb K|-algebra of univariate polynomials (with indeterminate |x|).
Further, |\mathrm{Hom}(X, Y)| will be the |\mathbb K|-vector space of |\mathbb K|-linear maps
from a |\mathbb K|-vector space |X| to a |\mathbb K|-vector space |Y|. In particular,
|X^* = \mathrm{Hom}(X,\mathbb K)| is the dual space of |X|.</p>
<p>The four main |\mathbb K|-vector spaces we’ll be focused on are |\mathscr F|, |P|, |P^*|, and
|\mathrm{Hom}(P, P)|. The first two are additionally |\mathbb K|-algebras,
and we’ll find that the third is as well and is, in fact, isomorphic to |\mathscr F| which is
arguably a key enabling fact of umbral calculus. In particular, the |\mathbb K|-algebra structure
induced on |P^*| via |\mathscr F| is called the <strong>umbral algebra</strong>.</p>
<p>Given the above, unsurprisingly, we’ll be talking a lot about formal power series and polynomials.
To save a bit of space and typing for me, unless otherwise specified, if I say |f| is a formal
power series or |f \in \mathscr F|, then that will also defined the sequence
|\pseq{f_n}{n}| such that |f(t) = \sum_{k=0}^\infty f_k t^k|. Generally, when
I state something is a sequence it will be a function |\mathbb N \to X| for some |X| and the
parameter will be written as a subscript. So the formal power series |f| also defines a sequence,
also called |f|, from |\mathbb N \to \mathbb K|. (In this case, we could literally identify
formal power series with these sequences.) Similarly, stating |p| is a polynomial or |p \in P|
defines a sequence also called |p| such that |p(x) = \sum_{n=0}^{\deg p}p_n x^n| where
|\deg p| is the <strong>degree of the polynomial</strong>, i.e. the largest value of |n| such that
|p_n \neq 0|. These are also sequences |\mathbb N \to \mathbb K|, and we could identify
polynomials with such sequences that are eventually always |0|. We also have the <strong>degree of a
formal power series</strong> written |\deg f| which is the <em>smallest</em> |k| such that |f_k \neq 0|. Note
that this is <em>dual</em> to the notion for polynomials. It’s clear that |\deg(fg) = \deg f + \deg g|.
Occasionally it will be useful to have |\deg 0 = -\infty| for polynomials and |\deg 0 = \infty|
for formal power series.</p>
<p>Of course, sometimes I will explicitly state something like |f(t) = \sum_{k=0}^\infty a_k t^k|
in which case the sequence |f| is not defined. Usually, this will arise with a formal power series
expression, e.g. |(f \circ g)(t)| so there shouldn’t be any ambiguity. As is typical, I’ll often
say “the formal power series |f(t)|” as opposed to “the formal power series |f|”. Finally, as has
been illustrated, I’ll endeavor to use |k| as the indexing variable for formal power series and
|n| for polynomials, but that won’t always be possible.</p>
<p>The <strong>Kronecker delta</strong> is written |\delta_k^n| and
defined by |\delta_n^n = 1| and |\delta_k^n = 0| for |k \neq n|. This should typically be
thought of as a way of forcing |n| and |k| to be equal, i.e. |f(n)\delta_k^n = f(k)\delta_k^n|
and |\delta_{f(k)}^n = \delta_k^{f^{-1}(n)}| or, equivalently,
|\delta_k^n = \delta_{f(k)}^{f(n)}| for an invertible function |f|.</p>
<h2 id="formal-power-series">Formal Power Series</h2>
<p>I’ll assume familiarity with the basic algebra of <a href="https://en.wikipedia.org/wiki/Formal_power_series">formal power series</a>.
<a href="https://arxiv.org/abs/2205.00879">This lecture</a> gives a nice in-depth and more technical overview,
though it goes far beyond what we’ll need. I’ll recall a few results and fix the terminology and
notation that will be used in this article which largely follows Roman but there are many
variations in the literature.</p>
<p><span id="wewp"><strong>Theorem</strong> (id:wewp):</span> <em>|f \in \mathscr F| has a multiplicative inverse, written |f^{-1}| if and only if
|\deg f = 0|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
|f(t)g(t) = \sum_{k=0}^\infty c_n t^k| where |c_k = \sum_{i+j=k} f_i g_j|.
Clearly, |c_0 = f_0 g_0| which makes it clear a multiplicative inverse to |f| can only exist if
|f_0 \neq 0|. It also makes it clear that for |g| to be |f^{-1}|, |g_0 = 1/f_0|. A simple
calculation shows that |g_k = f_0^{-1} \sum_{n=1} f_n g_{k-n}| for |k &gt; 0| which gives a
recurrence computing all the remaining coefficients of |f^{-1}|.
|\square|
</details>
<p>Thus, |f| being <strong>invertible</strong> will be synonymous with |f| having degree |0|.</p>
<p>It’s worth noting that |f/g| can be defined even when |g| isn’t invertible, e.g. |t/t = 1|.
If |\deg f &gt; \deg g|, then we can cancel out common factors of |t| until the denominator is
invertible.</p>
<p>Suppose |g : \mathbb N \to \mathscr F|, which we’ll write as |g_k(t) \in \mathscr F|,
such that |\deg g_k \to \infty| as |k \to \infty|. Given any |a : \mathbb N \to \mathbb K|,
then |\sum_{k=0}^\infty a_k g_k(t)| is a well-defined element of |\mathscr F|. In particular,
if |\deg g &gt; 0|, then |g_k(t) = g(t)^k| satisfies the conditions. If we have |\deg g_k = k|,
which is the case when |g_k(t) = g(t)^k| for |g| with degree |1| for example, then |g| forms a
<strong>pseudobasis</strong><a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> for |\mathscr F| meaning for any
|f \in \mathscr F|, there exists a unique sequence |a| such that
|f(t) = \sum_{k=0}^\infty a_n g_k(t)|. A series |f| with |\deg f = 1| is called a
<strong>delta series</strong>. Every delta series gives rise to a pseudobasis of |\mathscr F|.</p>
<p>If |f, g \in \mathscr F| and |\deg g &gt; 0|, then we can thus form the composition
|f(g(t)) = \sum_{k=0}^\infty f_k g(t)^k|. It’s clear that |\deg(f\circ g) = \deg f\deg g|.</p>
<p><span id="cjme"><strong>Theorem</strong> (id:cjme):</span> <em>A series, |f|, has a <strong>compositional inverse</strong>, written |\bar f|, meaning
|f(\bar f(t)) = t = \bar f(f(t)| if and only if |\deg f = 1|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>Suppose |g \in \mathscr F| such that |f(g(t)) = t|. By taking degrees, we
immediately see that |f| (and |g|) need to be of degree exactly one to have any chance for |g| to be
a compositional inverse to |f|. If |g(t)^k = \sum_{n=0}^\infty b_{k,n} t^n|, then clearly we
need |f_1 b_{1,1} = 1|. |g(t)^{k+1} = g(t)g(t)^k| implies
|b_{k+1,n} = \sum_{i+j=n} b_{1,i}b_{k,j}|. But note that |b_{k,n} = 0| for all |n &lt; k|
since |\deg g(t)^k = k| so this sum always has |k \leq j &lt; n| and thus
|1 \leq i \leq n-k|. Because the |n|-th coefficient of |f(g(t))| is |0| for |n &gt; 1|,
\[\sum_{k=1}^n f_k b_{k,n} = 0 = f_1 b_{1,n} + \sum_{k=2}^n f_k b_{k,n}\]
Expanding |b_{k,n}| in the last sum shows that it only involves |b_{1,i}| for |i &lt; n|.
Since |\deg f = 1|, |f_1 \neq 0|, and we can thus solve for |b_{1,n}| iteratively.</p>
Alternatively, simply note that if
|f(t) = th(t)| and |g(t) = tk(t)| then |h| and |k| have degree |0| and
|t = f(g(t)) = t k(t)h(g(t))|. So |k(t) = h(g(t))^{-1} = h(tk(t))^{-1}|. |h| is invertible and
|tk(t)| clearly has degree greater than |0| so the composition is well-defined and invertible.
Unfolding this expression for |k(t)| shows that the |i|th coefficient only depends on earlier
coefficients. |\square|
</details>
<p>A useful result linking these two together is if |\deg f = 1|, then
\[ 1 = t’ = [\bar f(f(t))]' = \bar f’(f(t))f’(t) \]
where |f’(t)| is differentiation of formal power series. In other words,
|f’(t)^{-1} = \bar f’(f(t))|.</p>
<h2 id="linear-functionals">Linear Functionals</h2>
<p>For |L \in P^*| and |p \in P|, we’ll write |\pair{L}{p(x)}| for the action of
|L| on |p|. Any such |L| is uniquely defined by its values on |x^n| for all |n\in\mathbb N|.</p>
<p>If |c : \mathbb N \to \mathbb K\setminus \{0\}|, we can define for each |f \in \mathscr F| a
linear functional which we’ll also write as |f| or |f(t)| via
\[\pair{f(t)}{x^n} = c_n f_n\] Really, we should write something like
|\pair{f(t)}{p(x)}_c| to indicate the dependence on |c|. This play on notation is unambiguous
since |f(t) = g(t)| if and only if |\pair{f(t)}{x^n} = \pair{g(t)}{x^n}| for all |n|,
i.e. |f| and |g| are equal as power series if and only if the induced linear functionals are equal.</p>
<p>Notable choices for |c| are:</p>
<ul>
<li>|c_n = n!| is the most traditional case.</li>
<li>|c_n = 1|</li>
<li>|c_n = 1/{\lambda \choose n}| for |\lambda| not a negative integer.</li>
</ul>
<p>The definition of the linear functional induced by |f \in \mathscr F| implies that
|\pair{t^k}{x^n} = c_n\delta_k^n|. This leads to
\[\bigpair{\sum_{n=0}^\infty a_n t^n}{p(x)}
= \sum_{n=0}^\infty a_n \pair{t^n}{p(x)} \] where the right-hand side
is well-defined because only finitely many of the terms of the sum will be non-zero.
(We can generalize to allow <a href="https://en.wikipedia.org/wiki/Laurent_series">Laurent series</a> with
only finitely many negative powers on the left and Laurent series with only finitely many positive
powers on the right.)</p>
<p>We can articulate L’Hôpital’s rule with this notation as: if |\deg f \geq \deg g &gt; 0|, then
\[\pair{f(t)/g(t)}{x^0} = \pair{f’(t)/g’(t)}{x^0} \]</p>
<p>We can explicitly write the formal power series, |f_L|, corresponding to the linear functional,
|L|, as \[f_L(t) = \sum_{k=0}^\infty \frac{\pair{L}{x^k}}{c_n}t^k\]
It is trivial to verify that the linear functional induced by |f_L| is |L|. This gives an
isomorphism |\mathscr F \cong P^*| as |\mathbb K|-vector spaces. However, the algebra
structure on |\mathscr F| then induces an algebra structure on |P^*|. We can compute
\[\pair{f(t)g(t)}{x^n}\rangle
= \sum_{i+j=n}\frac{c_n}{c_i c_j}\pair{f(t)}{x^i}\pair{g(t)}{x^j}\]</p>
<p>We’ll call |L| a <strong>delta</strong>/<strong>invertible functional</strong> when it corresponds to a delta/invertible
power series.</p>
<h3 id="some-properties">Some Properties</h3>
<p><span id="yiyi"><strong>Proposition</strong> (id:yiyi):</span> <em>If |\deg f &gt; \deg p| then |\pair{f(t)}{p(x)} = 0|.</em><br/>
<strong>Proof</strong>: Let |N = \deg p|.
|\pair{f(t)}{p(x)} = \sum_{n=0}^N p_n\pair{f(t)}{x^n} = \sum_{n=0}^N p_n f_n c_n| but
|f_n = 0| for all |n &lt; \deg f|. |\square|</p>
<p><span id="pove"><strong>Proposition</strong> (id:pove):</span> <em>If |\deg p_n(x) = n| and |\pair{f(t)}{p_n(x)} = 0| for all |n \in \mathbb N|, then |f(t) = 0|.</em><br/>
<strong>Proof</strong>: If |f(t) = \sum_{k=K}^\infty f_k t^k|, then
|\pair{f(t)}{p_K(x)} = p_{K,K}\pair{f(t)}{x^K} = p_{K,K} f_K c_K = 0| so |f_K = 0| and
we can repeat with |K+1|. |\square|</p>
<p><span id="fpoz"><strong>Proposition</strong> (id:fpoz):</span> <em>|\pair{f(at)}{p(x)} = \pair{f(t)}{p(ax)}|</em><br/>
<strong>Proof</strong>: Follows immediately from
|\pair{a^n t^n}{x^n} = a^n c_n = \pair{t^n}{a^n x^n}|. |\square|</p>
<p><span id="nxqu"><strong>Proposition</strong> (id:nxqu):</span> <em>|p(x) = \sum_{n=0}^\infty \frac{\pair{t^n}{p(x)}}{c_n}x^n|</em><br/>
<strong>Proof</strong>: Just expand |p(x)| as |\sum_{n=0}^{\deg p} p_n x^n| on both sides and simplify.
|\square|</p>
<p><span id="rnhv"><strong>Proposition</strong> (id:rnhv):</span> <em>If |\deg f_k = k| and |\pair{f_k(t)}{p(x)} = 0| for all |k \in \mathbb N|, then |p(x) = 0|.</em><br/>
<strong>Proof</strong>: If |p(x) = \sum_{n=0}^k p_n x^n| and |f_k(t) = \sum_{j=k}^\infty a_j t^j|,
then |\pair{f_k(t)}{p(x)} = \pair{a_k t^k}{p_k x^k} = a_k p_k c_k = 0| so |p_k = 0|.
|\square|</p>
<p>Propositions <a href="#pove">(id:pove)</a> and <a href="#rnhv">(id:rnhv)</a> will often be invoked
tacitly to show that two formal power series or polynomials are equal. For example, choose
|p(x) = r(x) - s(x)| in <a href="#rnhv">proposition (id:rnhv)</a>. In particular, we will often
just prove something like |\pair{t^k}{r(x)} = \pair{t^k}{s(x)}| with |k| implicitly being
arbitrary to conclude |r(x) = s(x)|.</p>
<h3 id="evaluation-functional">Evaluation Functional</h3>
<p>We always have the <strong>evaluation functional</strong> |\varepsilon_y| for |y \in \mathbb K| defined by
\[\pair{\varepsilon_y(t)}{p(x)} = p(y)\] Note that this definition doesn’t depend on the choice
of |c|. We quickly compute |\pair{\varepsilon_y(t)}{x^n} = c_n y^n| so
\[ \varepsilon_y(t) = \sum_{k=0}^\infty \frac{y^k}{c_k}t^k\]</p>
<p>When |c_n = n!|, then |\varepsilon_y(t) = e^{yt}|.</p>
<h3 id="formal-derivative">Formal Derivative</h3>
<p>The <strong>formal derivative</strong> of |f \in \mathscr F|, written |\partial_t f(t)|, is
defined as \[\partial_t t^k = \begin{cases}
\frac{c_k}{c_{k-1}}t^{k-1}, &amp; k &gt; 0 \\
0, &amp; k = 0
\end{cases}\] which leads to the key property
|\pair{\partial_t f(t)}{p(x)} = \pair{f(t)}{xp(x)}|.</p>
<p>As an example, we immediately compute that |\partial_t\varepsilon_y(t) = y\varepsilon_y(t)|.</p>
<p>We will also use the ordinary derivative of formal power series which we’ll notate with
|f’(t)|. The formal derivative and the ordinary derivative coincide when |c_n = n!| as
suggested by the previous example.</p>
<h2 id="linear-operators">Linear Operators</h2>
<p>We’ve identified formal power series with linear functionals on |P|. Next, we want to identify them
with linear operators on |P|. We’re clearly not going to get an isomorphism in this case as
multiplication (i.e. composition) of linear operators doesn’t commute in general, while
multiplication of formal power series does. Nevertheless, we will derive simple characterizations
of which linear operators are of this form.</p>
<p>One of the most important properties we will want is the following <strong>adjointness property</strong>:
\[ \pair{f(t)g(t)}{p(x)} = \pair{f(t)}{g(t)p(x)} \] where |g(t) p(x)| is the action of the
linear operator induced by |g(t)| on |p(x)|. We can derive what the induced linear operator must
be to satisfy this property.</p>
<p>If |k \leq m| and |k \leq n|, then</p>
<p>\[\begin{align}
\pair{t^{m-k}}{t^k x^n}
&amp; = \pair{t^{m-k} t^k}{x^n} \\
&amp; = \pair{t^m}{x^n} \\
&amp; = c_n \delta_m^n \\
&amp; = c_n \delta_{m-k}^{n-k} \\
&amp; = c_n \frac{\pair{t^{m-k}}{x^{n-k}}}{c_{n-k}} \\
&amp; = \bigpair{t^{m-k}}{\frac{c_n}{c_{n-k}} x^{n-k}}
\end{align}\]
so |t^k x^n = \frac{c_n}{c_{n-k}} x^{n-k}| for |k \leq n| and |0| otherwise. Thus,
\[f(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k}} f_k x^{n-k} \]
which can be extended to all polynomials by linearity.</p>
<p>This should look familiar. |tx^n = \frac{c_n}{c_{n-1}}x^{n-1}| which is exactly the same
formula as the one for |\partial_t| except this operates on polynomials while |\partial_t|
operates on formal power series. In particular, when |c_n = n!|, |t| behaves exactly like the
derivative of polynomials with respect to |x|, and we see that the formal power series pick out a
special class of differential operators on polynomials.</p>
<p>A simple calculation shows that |(t^j t^k) x^n = t^j (t^k x^n)| which lifts to a general
associativity law: |(f(t) g(t)) p(x) = f(t) (g(t) p(x))|. The adjointness property also immediately
implies that the induced linear operators commute.</p>
<p>As before, we will say <strong>delta</strong>/<strong>invertible operator</strong> when the linear operator is induced by
a delta/invertible formal power series.</p>
<p>Define |Dx^n = nx^{n-1}|, |D^{-1}x^n = \frac{1}{n+1}x^{n+1}|, and
|x^{-1} x^n = \begin{cases}x^{n-1}, &amp; n &gt; 0 \\ 0, &amp; n = 0\end{cases}|. Then for various
choices of |c|, |t| behaves as the following linear operators:</p>
<ul>
<li>|c_n = n!| implies |t = D|</li>
<li>|c_n = 1| implies |t = x^{-1}|</li>
<li>|c_n = (n!)^{m+1}| implies |t = (Dx)^m D|</li>
<li>|c_n = 1/(-\lambda)_{(n)}| implies |t = -(\lambda + xD)^{-1} x^{-1}|</li>
<li>|c_n = 1/n| implies |t = x^{-1} D^{-1} x^{-1}|</li>
<li>|c_n = 1/{-\lambda \choose n}| implies |t = -(\lambda + xD)^{-1} D|</li>
<li>|c_n = 2^{2n}(1+\alpha)^{(n)}/(1 + \alpha + \beta)^{(2n)}| implies
|t = 4(1 + \alpha + \beta + 2xD)^{-1} (2 + \alpha + \beta + 2xD)^{-1} x^{-1} (\alpha + xD)|</li>
<li>|c_n = (1-q^n)^{-1} \prod_{k=1}^n (1-q^k)| implies |tp(x) = (p(qx) - p(x))/(qx - x)|</li>
</ul>
<p>Here |x_{(n)}| and |x^{(n)}| are the
<a href="https://en.wikipedia.org/wiki/Falling_and_rising_factorials">falling and rising factorials</a>
respectively.</p>
<p>A linear operator |T| on |\mathscr F| is <strong>continuous</strong> if given a sequence of formal power series
|\pseq{f_k}{k}| such that |\deg f_k \to \infty| as |k \to \infty|, we have
|\deg T(f_k) \to \infty| as |k \to \infty|.</p>
<p><span id="fqys"><strong>Theorem</strong> (id:fqys):</span> <em>If |T| is a continuous linear operator on |\mathscr F|, then
\[ T\left(\sum_{k=0}^\infty a_k f_k(t)\right) = \sum_{k=0}^\infty a_k T(f_k(t)) \]
for all sequences |\pseq{a_k}{k}| in |\mathbb K| and
|\pseq{f_k}{k}| in |\mathscr F| for which |\deg f_k \to \infty|
as |k \to \infty|. In particular, a continuous linear operator is completely determined by its
action on the elements of a pseudobasis.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
By the assumptions, both |\pair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n}| and
|\pair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n}| involve only finitely many terms of the sum.
That is, for every |n| there is some |N| such that for all |m &gt; N|,
|\pair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n}
= \pair{T\left(\sum_{k=0}^m a_k f_k(t)\right)}{x^n}|
and |\pair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n} = \pair{\sum_{k=0}^m a_k T(f_k(t))}{x^n}|.
But
\[\begin{align}
\bigpair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n}
&amp; = \bigpair{T\left(\sum_{k=0}^m a_k f_k(t)\right)}{x^n} \\
&amp; = \bigpair{\sum_{k=0}^m a_k T(f_k(t))}{x^n} \\
&amp; = \bigpair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n}
\end{align}\]
so these two linear functionals agree on a pseudobasis and thus are the same which implies the
formal power series are the same as well.
|\square|
</details>
<p>This can be cast as an instance of topological continuity, but I won’t describe that here.</p>
<h3 id="evaluation-operator">Evaluation Operator</h3>
<p>Unlike the linear functional case, the linear operator induced by the formal power series
corresponding to the evaluation functional does depend on the choice of |c|. In general, we have:
\[\varepsilon_y(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k} c_k} y^k x^{n-k} \]</p>
<p>For |c_n = n!|, |\varepsilon_y(t) p(x) = p(x + y)|. For |c_n = 1|,
|\varepsilon_y(t) x^n = \frac{x^{n+1} - y^{n+1}}{x-y}|.</p>
<h3 id="characterizing-linear-operators-induced-from-formal-power-series">Characterizing Linear Operators Induced from Formal Power Series</h3>
<p><span id="lgtb"><strong>Theorem</strong> (id:lgtb):</span> <em>Let |U| be a linear operator on |P|. There is an |f \in \mathscr F| such that
|Up(x) = f(t) p(x)| for all |p \in P| if and only if |U| commutes with the operator |t|, i.e.
|Utp(x) = tUp(x)| for all |p \in P|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
The only if direction is obvious. For the other direction, first note that
|\deg Ux^n \leq n| because |t^k Ux^n = Ut^k x^n = 0| if |k &gt; n| so |\pair{t^k}{Ux^n} = 0|
for all |k &gt; n|. Now define
\[f(t) = \sum_{k=0}^\infty \frac{\pair{t^0}{Ux^k}}{c_k}t^k \]
Then, \[\begin{align}
f(t) x^n
&amp; = \sum_{k=0}^n \frac{\pair{t^0}{Ux^k}}{c_k}t^k x^n \\
&amp; = \sum_{k=0}^n \frac{c_n}{c_k c_{n-k}} \pair{t^0}{Ux^k} x^{n-k} \\
&amp; = \sum_{k=0}^n \frac{\pair{t^0}{Ut^{n-k} x^n}}{c_{n-k}} x^{n-k} \\
&amp; = \sum_{k=0}^n \frac{\pair{t^0}{t^{n-k} Ux^n}}{c_{n-k}} x^{n-k} \\
&amp; = \sum_{k=0}^n \frac{\pair{t^{n-k}}{Ux^n}}{c_{n-k}} x^{n-k} \\
&amp; = \sum_{k=0}^n \frac{\pair{t^k}{Ux^n}}{c_k} x^k \\
&amp; = Ux^n
\end{align}\]
The last equality relies on the degree of |Ux^n| being less than or equal to |n|.
|\square|
</details>
<p><span id="viti"><strong>Corollary</strong> (id:viti):</span> <em>A linear operator on P has the form of |f(t)| for an |f \in \mathscr F| if and
only if it commutes with any specific delta operator.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
The sequence of powers of the formal power series associated with the specific delta
operator form a pseudobasis which means we can write |t| as an infinite linear combination of them.
Thus the linear operator commutes with |t| and we can apply the <a href="#lgtb">theorem</a>.
|\square|
</details>
<p><span id="qllw"><strong>Corollary</strong> (id:qllw):</span> <em>A linear operator on P has the form of |f(t)| for an |f \in \mathscr F| if and
only if it commutes with |\varepsilon_y(t)| for all |y \in \mathbb K|.</em></p>
<p><strong>Proof</strong>: |\varepsilon_y(t) - c_0^{-1} t^0| is a delta operator. |\square|</p>
<h2 id="polynomial-sequences">Polynomial Sequences</h2>
<p>When we say |\pseq{p_n(x)}{n}| is a (<strong>polynomial</strong>) <strong>sequence</strong>, that will always
mean that |\deg p_n(x) = n|.</p>
<p><span id="cgsr"><strong>Theorem</strong> (id:cgsr):</span> <em>Let |f| be a delta series and |g| be an invertible series, then there is a unique
polynomial sequence |\pseq{s_n(x)}{n}| such that
\[\pair{g(t)f(t)^k}{s_n(x)} = c_n\delta_k^n \]
holds for all |n,k \in \mathbb N|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>Uniqueness follows easily by considering |\pair{g(t)f(t)^k}{s_n(x) - r_n(x)} = 0|
where |\pseq{r_n(x)}{n}| is another sequence satisfying the same property.</p>
For existence, we can just brute force it. If |g(t)f(t)^k = \sum_{i=k}^\infty b_{k,i} t^i|
and we set |s_n(x) = \sum_{j=0}^n a_{n,j} x^j|, then we want to solve for the |a_{n,i}|
induced by the following triangular system of linear equations:
\[\begin{align}
c_n\delta_k^n
&amp; = \pair{g(t)f(t)^k}{s_n(x)} \\
&amp; = \bigpair{\sum_{i=k}^\infty b_{k,i} t^i}{\sum_{j=0}^n a_{n,j} x^j} \\
&amp; = \bigpair{\sum_{i=k}^n b_{k,i} t^i}{\sum_{j=0}^n a_{n,j} x^j} \\
&amp; = \sum_{i=k}^n \sum_{j=0}^n b_{k,i} a_{n,j} \pair{t^i}{x^j} \\
&amp; = \sum_{i=k}^n c_i b_{k,i} a_{n,i}
\end{align}\]
|b_{k,k} \neq 0| since |\deg g(t)f(t)^k = k| and |c_n \neq 0| by assumption. Therefore, the
diagonal entries of the triangular matrix corresponding to this system of linear equations are
non-zero, and thus the matrix is invertible.
|\square|
</details>
<p>We’ll say |\pseq{s_n(x)}{n}| is the <strong>Sheffer sequence</strong> or is <strong>Sheffer</strong> for
the pair |(g, f)|. When |g(t) = 1|, then we say that the corresponding Sheffer sequence is the
<strong>associated sequence</strong> to |f|. When |f(t) = t|, then we say that the corresponding Sheffer sequence
is the <strong>Appell sequence</strong> for |g|. Often I’ll use |\pseq{p_n(x)}{n}| for
associated sequences, i.e. when |g(t) = 1|, and reserve |\pseq{s_n(x)}{n}| for
the general case. The idea is that if |g(t)f(t)^k| takes the place of |t^k|, then |s_n(x)|
takes the place of |x^n|. This is illustrated by the defining property and the following theorems.</p>
<p><span id="ExpansionTheorem"><strong>Theorem</strong> (Expansion Theorem):</span> <em>Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|.
Then for any |h \in \mathscr F|,
\[ h(t) = \sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k \]</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
|\pseq{s_n(x)}{n}|, like any polynomial sequence, is a basis for |P|, so two linear
functionals are equal if they agree on all |s_n(x)|. Clearly,
\[\begin{align}
\bigpair{\sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k}{s_n(x)}
&amp; = \bigpair{\sum_{k=0}^n \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k}{s_n(x)} \\
&amp; = \sum_{k=0}^n \frac{\pair{h(t)}{s_k(x)}}{c_k}\pair{g(t)f(t)^k}{s_n(x)} \\
&amp; = \pair{h(t)}{s_n(x)}
\end{align}\]
|\square|
</details>
<p><span id="PolynomialExpansionTheorem"><strong>Corollary</strong> (Polynomial Expansion Theorem):</span> <em>Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|.
Then for a |p \in P|,
\[ p(x) = \sum_{n=0}^\infty \frac{\pair{g(t)f(t)^n}{p(x)}}{c_n} s_n(x) \]</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
Choose |h = \varepsilon_y| in the <a href="#ExpansionTheorem">Expansion Theorem</a>,
then apply it as a linear functional to |p|. (Note this proof relies on |\mathbb K| having
characteristic |0| so that |p(y) = q(y)| as functions |\mathbb K \to \mathbb K| implies |p = q|
as polynomials.)
|\square|
</details>
<p><span id="GeneratingFunction"><strong>Theorem</strong> (Generating Function):</span> <em>|\pseq{s_n(x)}{n}| is
Sheffer for |(g, f)| if and only if
\[ \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t))
= \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \]
for all |y \in \mathbb K|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>For the forward implication, using the <a href="#ExpansionTheorem">Expansion Theorem</a> we have:
\[ \varepsilon_y(t)
= \sum_{k=0}^\infty \frac{\pair{\varepsilon_y}{s_k(x)}}{c_k} g(t) f(t)^k
= \sum_{k=0}^\infty \frac{s_k(y)}{c_k} g(t) f(t)^k \]
Substituting |\bar f(t)| for |t| and dividing both sides by |g(\bar f(t))| gives the result.</p>
For the reverse implication, if |\pseq{r_n(x)}{n}| is the Sheffer sequence for |(g, f)| then we
immediately get from the forward implication
\[ \sum_{k=0}^\infty \frac{r_k(y)}{c_k} t^k
= \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t))
= \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \]
and applying both sides to |x^n| then gives |s_n(y) = r_n(y)| for all |y \in \mathbb K|.
(Again, this proof relies on the characteristic of |\mathbb K| being |0|.)
|\square|
</details>
<p><span id="ConjugateRepresentation"><strong>Theorem</strong> (Conjugate Representation):</span> <em>|\pseq{s_n(x)}{n}| is Sheffer for
|(g, f)| if and only if
\[ s_n(x) = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} x^k \]</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
Applying |\varepsilon_y| to both sides gives
\[\begin{align}
s_n(y)
&amp; = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} y^k \\
&amp; = \bigpair{\sum_{k=0}^n \frac{y^k}{c_k}g(\bar f(t))^{-1}\bar f(t)^k}{x^n} \\
&amp; = \bigpair{\sum_{k=0}^\infty \frac{y^k}{c_k}g(\bar f(t))^{-1}\bar f(t)^k}{x^n} \\
&amp; = \pair{g(\bar f(t))^{-1}\varepsilon_y(\bar f(t))}{x^n} \\
\end{align}\]
but \[ s_n(y) = \bigpair{\sum_{k=0}^\infty \frac{s_n(y)}{c_k}t^k}{x^n} \]
so we can apply the <a href="#GeneratingFunction">Generating Function theorem</a>.
|\square|
</details>
<p><span id="MultiplicationTheorem"><strong>Theorem</strong> (Multiplication Theorem):</span>
<em>Let |\pseq{s_n(x)}{n}| be Appell for |g|, then
\[ s_n(\alpha x) = \alpha^n \frac{g(t)}{g(t/\alpha)} s_n(x) \]
for |\alpha\neq 0|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
\[\begin{align}
\pair{t^k}{g(t/\alpha) s_n(\alpha x)}
&amp; = \pair{t^k g(t/\alpha)}{s_n(\alpha x)} \\
&amp; = \pair{(\alpha t)^k g(\alpha t/\alpha)}{s_n(x)} \tag{proposition (id: fpoz)} \\
&amp; = \alpha^k \pair{g(t) t^k}{s_n(x)} \\
&amp; = \alpha^k c_n \delta_k^n \\
&amp; = \alpha^n c_n \delta_k^n \\
&amp; = \alpha^n \pair{g(t)t^k}{s_n(x)} \\
&amp; = \pair{t^k}{\alpha^n g(t) s_n(x)}
\end{align}\]
so |g(t/\alpha) s_n(\alpha x) = \alpha^n g(t) s_n(x)|.
|\square|
</details>
<p><span id="qqes"><strong>Theorem</strong> (id:qqes):</span> <em>|\pseq{s_n(x)}{n}| is Sheffer for |(g, f)|
if and only if |\pseq{g(t)s_n(x)}{n}| is the associated sequence for |f|.</em></p>
<p><strong>Proof</strong>: Just apply adjointness to the definition. |\square|</p>
<p><span id="cutg"><strong>Theorem</strong> (id:cutg):</span> <em>A sequence |\pseq{p_n(x)}{n}| is the associated sequence for
|f| if and only if 1) |\pair{t^0}{p_n(x)} = c_0 \delta_n^0| for all |n \in \mathbb N|,
and 2) |f(t) p_n(x) = \frac{c_n}{c_{n-1}}p_{n-1}(x)| for all |n \in \mathbb N_+|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>|f(t)^0 = t^0| implies the first condition. For the second condition, if
|\pseq{p_n(x)}{n}| is associated to |f|, then for |k &gt; 0|
\[\begin{align}
\bigpair{f(t)^{k-1}}{\frac{c_n}{c_{n-1}}p_{n-1}(x)}
&amp; = \frac{c_n}{c_{n-1}}\pair{f(t)^{k-1}}{p_{n-1}(x)} \\
&amp; = c_n \delta_k^n \\
&amp; = \pair{f(t)^k}{p_n(x)} \\
&amp; = \pair{f(t)^{k-1}}{f(t)p_n(x)}
\end{align}\] and |\pseq{f(t)^k}{k}| is a pseudobasis.</p>
Conversely, assuming (1) and (2) hold, then
\[\begin{align}
\pair{f(t)^k}{p_n(x)}
&amp; = \pair{t^0}{f(t)^k p_n(x)} \\
&amp; = \frac{c_n}{c_{n-k}} \pair{t^0}{p_{n-k}(x)} \tag{(2) k times} \\
&amp; = \frac{c_n}{c_{n-k}} c_0 \delta_{n-k}^0 \tag{(1)} \\
&amp; = c_n \delta_k^n
\end{align}\]
|\square|
</details>
<p><span id="hvdt"><strong>Theorem</strong> (id:hvdt):</span> <em>A sequence |\pseq{s_n(x)}{n}| is Sheffer for
|(g, f)| for some invertible |g| if and only if |f(t) s_n(x) = \frac{c_n}{c_{n-1}}s_{n-1}(x)|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>For the forward implication, simply apply the <a href="#cutg">previous theorem</a> to
|\pseq{g(t)s_n(x)}{n}| which is associated to |f|, then apply |g(t)^{-1}| to the resulting
recurrence equation.</p>
For the reverse implication, let |\pseq{p_n(x)}{n}| be the associated sequence for |f| and
|U| be the linear operator defined by sending |s_n(x)| to |p_n(x)|. Then we have
\[
Uf(t)s_n(x)
= \frac{c_n}{c_{n-1}}Us_{n-1}(x)
= \frac{c_n}{c_{n-1}}p_{n-1}(x)
= f(t)p_n(x)
= f(t)Us_n(x)
\]
Since |\pseq{s_n(x)}{n}| is a basis, we see that |U| commutes with a delta series and thus
must be of the form |g(t)| for some |g| which is invertible because |U| preserves degree.
Thus |\pseq{g(t)s_n(x)}{n}| is associated to |f| which is equivalent to |\pseq{s_n(x)}{n}|
being Sheffer for |(g, f)|.
|\square|
</details>
<p>Iterating |k| times gives |f(t)^k s_n(x) = \frac{c_n}{c_{n-k}}s_{n-k}(x)|,
and this implies
\[ h(f(t)) s_n(x) = \sum_{k=0}^n \frac{c_n}{c_{n-k}} h_k s_{n-k}(x) \]</p>
<p><span id="tvdx"><strong>Corollary</strong> (id:tvdx):</span>
\[ ts_n(x) = \sum_{k=0}^{n-1} \frac{c_n}{c_k c_{n-k}} \pair{t}{p_{n-k}(x)} s_k(x) \]</p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
Start by expanding |ts_n(x)| via the
<a href="#PolynomialExpansionTheorem">Polynomial Expansion Theorem</a>
\[\begin{align}
ts_n(x)
&amp; = \sum_{k=0}^\infty \frac{\pair{g(t)f(t)^k}{ts_n(x)}}{c_k} s_k(x) \\
&amp; = \sum_{k=0}^{n-1} \frac{\pair{t}{g(t)f(t)^k s_n(x)}}{c_k} s_k(x) \\
&amp; = \sum_{k=0}^{n-1} \frac{c_k}{c_{n-k} c_k} \pair{t}{g(t) s_{n-k}(x)} s_k(x) \tag{theorem (id:hvdt)} \\
&amp; = \sum_{k=0}^{n-1} \frac{c_k}{c_{n-k} c_k} \pair{t}{p_{n-k}(x)} s_k(x) \tag{theorem (id:qqes)}
\end{align}\]
|\square|
</details>
<p><span id="ShefferIdentity"><strong>Theorem</strong> (Sheffer Identity):</span> <em>A sequence |\pseq{s_n(x)}{n}| is Sheffer for
|(g, f)| for some invertible |g| if and only if
\[ \varepsilon_y(t) s_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) s_j(x) \]
for all |y \in \mathbb K| where |\pseq{p_n(x)}{n}| is associated to |f|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>First, we’ll establish that
\[ \varepsilon_y(t) p_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) p_j(x) \]</p>
<p>Applying |f(t)^k| to both sides of the equation leads to
\[\begin{align}
\pair{f(t)^k}{\varepsilon_y(t) p_n(x)}
&amp; = \pair{\varepsilon_y(t)}{f(t)^k p_n(x)} \\
&amp; = \frac{c_n}{c_{n-k}} \pair{\varepsilon_y(t)}{p_{n-k}(x)} \\
&amp; = \frac{c_n}{c_{n-k}} p_{n-k}(y) \\
&amp; = \sum_{i+j=n} \frac{c_n}{c_i c_{j-k}} p_i(y) \pair{t^0}{p_{j-k}(x)} \\
&amp; = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) \pair{t^0}{f(t)^k p_j(x)} \\
&amp; = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) \pair{f(t)^k}{p_j(x)} \\
&amp; = \bigpair{f(t)^k}{\sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) p_j(x)}
\end{align}\]
which is most easily read from outwards in.</p>
Doing the same trick as the previous proof, we let |U| be a linear operator defined by
sending |s_n(x)| to |p_n(x)|. If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some |g|,
we can choose |U = g(t)| and applying |U| to both sides of the above equation gives the forward
direction since |g(t)| commutes with |\varepsilon_y(t)|. Conversely, if we assume the equation
then we’ll see that |U| must commute with |\varepsilon_y(t)| which implies its of the form
|g(t)|.
|\square|
</details>
<p>As an example, we see that the Bernoulli polynomials |B_n(x)| are Sheffer for |(g, t)| for some
invertible |g| with |c_n = n!| for which the associated polynomials are |\pseq{x^n}{n}|.
The Sheffer Identity is thus the one mentioned in the Introduction.</p>
<p><span id="jtbs"><strong>Theorem</strong> (id:jtbs):</span> <em>Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and
|\pseq{p_n(x)}{n}| be associated to |f|. For all |h, l \in \mathscr F|,
\[ \pair{h(t)l(t)}{s_n(x)}
= \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{h(t)}{p_i(x)} \pair{l(t)}{s_j(x)} \]</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
Use the <a href="#ExpansionTheorem">Expansion Theorem</a> on |h| with respect to |\pseq{p_n(x)}{n}| and |l|
with respect to |\pseq{s_n(x)}{n}|. |h(t)l(t)| will then be
\[ \sum_{i+j=k} \left(\frac{\pair{h(t)}{p_i(x)}}{c_i} f(t)^i\right) \left(\frac{\pair{l(t)}{s_j(x)}}{c_j} g(t)f(t)^j\right)
= \left[\sum_{i+j=k} \frac{\pair{h(t)}{p_i(x)}\pair{l(t)}{s_j(x)}}{c_i c_j}\right] g(t)f(t)^k
\]
Applying this to |s_n(x)| gives the result.
|\square|
</details>
<p>See the paper for an interesting alternative proof of this result.</p>
<h2 id="recurrence-formulas">Recurrence Formulas</h2>
<p>Given a linear operator |\mu| on |P|, the <strong>adjoint</strong> |\mu^*| is a linear operator on
|\mathscr F| characterized by:
\[ \pair{\mu^* f(t)}{p(x)} = \pair{f(t)}{\mu p(x)} \]</p>
<p>We can readily compute that:
\[ \mu^* f(t) = \sum_{k=0}^\infty \frac{\pair{f(t)}{\mu x^k}}{c_k} t^k \]</p>
<p><span id="ahdu"><strong>Theorem</strong> (id:ahdu):</span> <em>The adjoint to a linear operator on |P| is continuous.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
Let |\pseq{f_k(t)}{k}| where |\deg f_k \to \infty| as |k \to \infty|.
Let |K_n| be an index such that |\deg f_{K_n} &gt; \sup_{i=0}^n \deg \mu x^i|.
|\pair{f_{K_n}(t)}{\mu x^m} = 0| for all |m \leq n|, so, using the formula above,
|\deg \mu^* f_{K_n} \geq n|.
|\square|
</details>
<p>In fact, a linear operator on |\mathscr F| is an adjoint to one on |P| <em>if and only if</em> it is
continuous.</p>
<p><span id="fdui"><strong>Theorem</strong> (id:fdui):</span> <em>If |T| is a continuous linear operator on |\mathscr F|, then there exists a linear
operator |\mu| on |P| such that |T = \mu^*|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
Define |\mu x^n = \sum_{k=0}^\infty \frac{\pair{Tt^k}{x^n}}{c_k} x^k| which
is well-defined because |T| being continuous means only finitely many |\pair{Tt^k}{x^n}| are
non-zero. By construction, we see that |\pair{t^k}{\mu x^n} = \pair{Tt^k}{x^n}|.
|\square|
</details>
<p>If |\pseq{p_n(x)}{n}| is associated to |f|, then the <strong>umbral shift</strong> |\theta_f|
associated to |f| is the linear operator on |P| defined by
\[ \theta_f p_n(x) = \frac{(n+1)c_n}{c_{n+1}} p_{n+1}(x) \]
for all |n \in \mathbb N|. In the case where |p_n(x) = x^n| and |c_n = n!| the umbral shift is
just multiplication by |x|. Since, famously, multiplication by |x| does <em>not</em> commute with
differentiation, |Dx - xD = 1| as operators, the umbral shift isn’t induced as a
linear operator by a formal power series. We’ll see that this is generally the case below.</p>
<p>A <strong>derivation</strong> on an algebra |A| is a linear operator |\partial| on |A| satisfying
\[ \partial(ab) = (\partial a)b + a\partial b \] for all |a, b \in A|.</p>
<p><span id="lxnr"><strong>Lemma</strong> (id:lxnr):</span> <em>A continuous linear operator |\partial| on |\mathscr F| is a continuous derivation if
and only if |\partial 1 = 0| and for any delta series |f|, |\partial f(t)^k = kf(t)^{k-1}g(t)|
for all |k \in \mathbb N| for some |g|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
A continuous derivation satisfies |\partial h(t) = h’(t)\partial t| from which the
result follows with |g(t) = f’(t)\partial t|. A continuous linear operator satisfying these laws
satisfies |\partial h(t) = h’(t)f’(t)^{-1}g(t)| which we can see by expanding |h(t)| in terms of the
pseudobasis |\pseq{f(t)^k}{k}| and using continuity to push the |\partial| into the sum.
Given this
\[\begin{align}
\partial (h(t)k(t))
&amp; = (h(t)k(t))'f’(t)^{-1}g(t) \\
&amp; = k(t)h’(t)f’(t)^{-1}g(t) + h(t)k’(t)f’(t)^{-1}g(t) \\
&amp; = (\partial h(t))k(t) + h(t)\partial k(t)
\end{align}\]
|\square|
</details>
<p><span id="oqyq"><strong>Theorem</strong> (id:oqyq):</span> An operator |\theta| on |P| is the umbral shift for the delta series |f| if and
only if its adjoint |\theta^*| is a derivation on |\mathscr F| and
\[ \theta^* f(t)^k = kf(t)^{k-1} \] for all |k \in \mathbb N|.</p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>If |\theta| is the umbral shift for |f| associated to |\pseq{p_n(x)}{n}|, then
\[\begin{align}
\pair{\theta^* f(t)^k}{p_n(x)}
&amp; = \pair{f(t)^k}{\theta p_n(x)} \\
&amp; = \frac{(n+1)c_n}{c_{n+1}}\pair{f(t)^k}{p_{n+1}(x)} \\
&amp; = (n+1)c_n \delta_k^{n+1} \\
&amp; = kc_n \delta_k^{n+1} \\
&amp; = kc_n \delta_{k-1}^n \\
&amp; = \pair{kf(t)^{k-1}}{p_n(x)}
\end{align}\]
We of course have
|\pair{\theta^* t^0}{p_n(x)}
= \pair{t^0}{\theta p_n(x)}
= \frac{(n+1)c_n c_0}{c_{n+1}} \pair{f(t)^0}{p_{n+1}(x)}
= \frac{(n+1)c_n c_0}{c_{n+1}} \delta_0^{n+1}
= 0| since |n+1| is never |0|.
Since |\theta^*| is an adjoint, it’s continuous, and we can apply the
<a href="#lxnr">previous lemma</a> to conclude that it is a derivation.</p>
If |\theta^*| is a derivation satisfying |\theta^* f(t)^k = kf(t)^{k-1}| then
we can rearrange the equations of the first result to get:
\[\begin{align}
\pair{f(t)^k}{\theta p_n(x)}
&amp; = \pair{\theta^* f(t)^k}{p_n(x)} \\
&amp; = \pair{kf(t)^{k-1}}{p_n(x)} \\
&amp; = kc_n \delta_{k-1}^n \\
&amp; = kc_n \delta_k^{n+1} \\
&amp; = (n+1)c_n \delta_k^{n+1} \\
&amp; = \frac{(n+1)c_n}{c_{n+1}}\pair{f(t)^k}{p_{n+1}(x)}
\end{align}\]
|\square|
</details>
<p>In the particular case where |f(t) = t|, the above states that |\theta_t^* t^k = kt^{k-1}|,
i.e. |\theta_t^* f(t) = f’(t)|. (Not to be confused with |\partial_t| which is
only the true derivative when |c_n = n!|.) We can easily compute that |\theta_t t = xD|.
Notably, this does <em>not</em> depend on the choice of |c|.</p>
<p><span id="kscz"><strong>Theorem</strong> (id:kscz):</span> <em>If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)|, then
\[ \theta_t s_n(x)
= \sum_{k=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}} \pair{g’(t)}{s_{n-k}(x)}
+ \frac{kc_n}{c_k c_{n-k+1}} \pair{g(t)f’(t)}{s_{n-k+1}(x)} \right] s_k(x) \]</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
Start with the <a href="#PolynomialExpansionTheorem">Polynomial Expansion Theorem</a>.
\[\begin{align}
\theta_t s_n(x)
&amp; = \sum_{n=0}^{n+1} \frac{\pair{g(t)f(t)^k}{\theta_t s_n(x)}}{c_k} s_k(x) \\
&amp; = \sum_{n=0}^{n+1} \frac{\pair{\theta_t^*(g(t)f(t)^k)}{s_n(x)}}{c_k} s_k(x) \\
&amp; = \sum_{n=0}^{n+1} \left[\frac{\pair{g’(t)f(t)^k + kg(t)f(t)^{k-1}f’(t)}{s_n(x)}}{c_k}\right] s_k(x) \\
&amp; = \sum_{n=0}^{n+1} \left[\frac{\pair{g’(t)}{f(t)^k s_n(x)}
+ \pair{kg(t)f’(t)}{f(t)^{k-1} s_n(x)}}{c_k}\right] s_k(x) \\
&amp; = \sum_{n=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}}\pair{g’(t)}{s_{n-k}(x)}
+ \frac{kc_n}{c_k c_{n-k+1}}\pair{g(t)f’(t)}{s_{n-k+1}(x)}\right] s_k(x)
\end{align}\]
|\square|
</details>
<p><span id="golc"><strong>Lemma</strong> (id:golc):</span> <em>A surjective derivation on |\mathscr F| is continuous.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
For any derivation |\partial|, we have |\partial 1 = \partial 1 + \partial 1| so
|\partial 1 = 0|. We also have |\partial t^k = kt^{k-1}\partial t|. Since
|\deg \partial t^0 = \infty| and |\deg \partial t^k = \deg \partial t + k - 1|, the only
way to get something of degree |0| and have |\partial| be surjective is if |\deg \partial t = 0|.
In general, if |\deg f = k| then |f(t) = t^k g(t)| where |\deg g = 0|. Therefore,
|\deg \partial f(t) = \deg (t^k \partial g(t) + kt^{k-1}g(t)\partial t)| and the right-hand
side has degree |k-1| implying |\partial| is continuous.
|\square|
</details>
<p><span id="gxkw"><strong>Theorem</strong> (id:gxkw):</span> <em>A surjective derivation on |\mathscr F| is adjoint to an umbral shift and vice versa.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>For the reverse direction, since |f| is a delta series in <a href="#oqyq">theorem (id:oqyq)</a>
and |\theta^* f(t)^k = kf(t)^{k-1}|, |\pseq{\theta^* f(t)^k}{k}| is a pseudobasis and so
|\theta^*| is surjective.</p>
For the forward direction, if |\partial| is a surjective derivation, then it is an adjoint of a
linear operator on |P| because it is continuous. We want to find the delta series |f| for which the
adjoint is the umbral shift. Generally, we have |\partial f(t)^k = kf(t)^{k-1} f’(t)\partial t|,
and we want |\partial f(t)^k = k f(t)^{k-1}| for <a href="#oqyq">theorem (id:oqyq)</a>.
Thus we want |f’(t)\partial t = 1| or |f’(t) = (\partial t)^{-1}| where |\partial t| is
invertible because |\partial| is surjective. We can solve this differential equation with
|f(0) = 0|, i.e. |\deg f = 1|, to determine the delta series |f|.
|\square|
</details>
<p><span id="woxc"><strong>Lemma</strong> (id:woxc):</span> <em>If |f| and |g| are delta series, then
\[\theta_f^* = (\theta_f^* g(t))\theta_g^* \]</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
For any derivation |\partial|, we have |\partial a^k = (\partial a)ka^{k-1}|
and |\theta_f^*| is a derivation. Therefore
\[ \theta_f^* g(t)^k
= (\theta_f^* g(t))kg(t)^{k-1}
= (\theta_f^* g(t)) \theta_g^* g(t)^k \]
and |g(t)^k| for |k\in \mathbb N| is a pseudobasis so this suffices to show the operators are
equal.
|\square|
</details>
<p><span id="dxii"><strong>Theorem</strong> (id:dxii):</span> <em>If |\theta_f| and |\theta_g| are umbral shifts, then
\[ \theta_f = \theta_g \circ (\theta_g^* f(t))^{-1} \]</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
\[\begin{align}
\pair{t^k}{\theta_f p(x)}
&amp; = \pair{\theta_f^* t^k}{p(x)} \\
&amp; = \pair{(\theta_g^* f(t))^{-1}\theta_g^* t^k}{p(x)} \\
&amp; = \pair{\theta_g^* t^k}{(\theta_g^* f(t))^{-1} p(x)} \\
&amp; = \pair{t^k}{\theta_g (\theta_g^* f(t))^{-1} p(x)}
\end{align}\]
|\square|
</details>
<p><span id="ouma"><strong>Theorem</strong> (id:ouma):</span> <em>If |\pseq{p_n(x)}{n}| is associated to |f|, then
\[ p_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n} \theta_t(f’(t))^{-1} p_n(x) \]</em></p>
<p><strong>Proof</strong>: This is just the <a href="#dxii">previous theorem</a> applied to |p_n(x)| with |g(t) = t|. |\square|</p>
<p><span id="xnvj"><strong>Lemma</strong> (id:xnvj):</span> <em>Let |\theta_f| be the umbral shift for |f|. Then
\[ \theta_f^*(h(t)) = h(t) \theta_f - \theta_f h(t) \]
for all |h \in \mathscr F|. The left-hand side is the linear operator on |P| induced by the
formal power series that is the output of |\theta_f^*|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
\[\begin{align}
\pair{t^k}{\theta_f^*(h(t))x^n + \theta_f h(t) x^n}
&amp; = \pair{t^k}{\theta_f^*(h(t))x^n} + \pair{t^k}{\theta_f h(t) x^n} \\
&amp; = \pair{\theta_f^*(h(t)) t^k}{x^n} + \pair{h(t)\theta_f^*(t^k)}{x^n} \\
&amp; = \pair{\theta_f^*(h(t)) t^k + h(t)\theta_f^*(t^k)}{x^n} \\
&amp; = \pair{\theta_f^*(h(t) t^k)}{x^n} \\
&amp; = \pair{t^k}{h(t) \theta_f x^n}
\end{align}\]
|\square|
</details>
<p>This lemma shows that <em>no</em> umbral shift has the form |g(t)| for a formal power series |g|.</p>
<p><span id="omlq"><strong>Theorem</strong> (id:omlq):</span> <em>Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|.
Then if |\theta_f| is the umbral shift for |f|,</em>
\[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]</p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
We have that |\pseq{g(t)s_n(x)}{n}| is associated to |f| leading to
\[ g(t)s_n(x) = \frac{c_{n+1}}{(n+1)c_n} \theta_f g(t)s_{n+1}(x) \]
The <a href="#xnvj">previous lemma</a> leads to
\[\begin{align}
s_n(x)
&amp; = \frac{c_{n+1}}{(n+1)c_n} g(t)^{-1}\theta_f g(t)s_{n+1}(x) \\
&amp; = \frac{c_{n+1}}{(n+1)c_n} g(t)^{-1}(g(t)\theta_f - \theta_f^*(g(t)))s_{n+1}(x) \\
&amp; = \frac{c_{n+1}}{(n+1)c_n} (\theta_f - g(t)^{-1}\theta_f^*(g(t)))s_{n+1}(x) \\
&amp; = \frac{c_{n+1}}{(n+1)c_n} (\theta_f + g(t)\theta_f^*(g(t)^{-1}))s_{n+1}(x)
\end{align}\]
where the final equality comes from
\[ 0 = \theta_f^*(1) = \theta_f^*(g(t)g(t)^{-1})
= g(t)^{-1}\theta_f^*(g(t)) + g(t)\theta_f^*(g(t)^{-1}) \]
|\square|
</details>
<p><span id="kusb"><strong>Theorem</strong> (id:kusb):</span> <em>Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|.</em>
\[ s_{n+1}(x)
= \frac{c_{n+1}}{(n+1)c_n}\left(\theta_t - \frac{g’(t)}{g(t)}\right) \frac{1}{f’(t)} s_n(x) \]</p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>Starting from the <a href="#omlq">previous theorem</a>
\[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]</p>
<a href="#dxii">Theorem (id:dxii)</a> gives |\theta_f = \theta_t f’(t)^{-1}|.
Since |\theta_f^* h(t) = h’(t)\theta_f^* t| for any |h|,
we first note that using <a href="#oqyq">theorem (id:oqyq)</a>,
|1 = \theta_f^* f(t) = f’(t) \theta_f^* t| or |\theta_f^* t = f’(t)^{-1}|.
Then
\[
g(t)\theta_f^*(g(t)^{-1})
= -\frac{g(t) g’(t)}{g(t)^2} \theta_f^* t
= -\frac{g’(t)}{g(t)} \theta_f^* t
= -\frac{g’(t)}{g(t)} f’(t)^{-1} \]
|\square|
</details>
<p><span id="vxhh"><strong>Theorem</strong> (id:vxhh):</span> <em>Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. If
\[ T
= \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{f(t)}{f’(t)}
= \left(xD - \frac{tg’(t)}{g(t)}\right)\frac{f(t)}{tf’(t)} \]
then \[ Ts_n(x) = ns_n(x) \]
In other words, |s_n(x)| is an eigenfunction for |T| with eigenvalue |n|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
The equality for |T| just involves inserting a |t/t| in the middle.
\[
Ts_n(x)
= \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{1}{f’(t)}f(t)s_n(x)
= \frac{c_n}{c_{n-1}}\left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{1}{f’(t)}s_{n-1}(x)
= n s_n(x) \]
where <a href="#kusb">theorem (id:kusb)</a> and <a href="#hvdt">theorem (id:hvdt)</a> have been used.
|\square|
</details>
<!--
Lemma 1 on page 81 (24) seems to have an unnecessary condition that the leading coefficient of |h|
is |1|. It is necessary for Lemma 2 so I assume it was just a copy/paste error. That said, the
theorem that uses these lemmas also repeats that assumption. I just can't see why that condition
is necessary.

TODO: Finish this section.
-->
<h2 id="transfer-formulas">Transfer Formulas</h2>
<p><span id="TransferFormula"><strong>Theorem</strong> (Transfer Formula):</span> <em>If |\pseq{p_n(x)}{n}| is the associated sequence
of |f|, then
\[ p_n(x) = f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \]
for all |n \in \mathbb N|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>We verify that the right-hand side meets the conditions of <a href="#cutg">theorem (id:cutg)</a>.
Condition (2) is easily verified:
\[\begin{align}
f(t) p_n(x)
&amp; = f(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \\
&amp; = f’(t)\left(\frac{t}{f(t)}\right)^n t x^n \\
&amp; = \frac{c_n}{c_{n-1}} f’(t)\left(\frac{t}{f(t)}\right)^n x^{n-1} \\
&amp; = \frac{c_n}{c_{n-1}} p_{n-1}(x)
\end{align}\]</p>
<p>For condition (1), we start with a small trick by writing |f’(t) = [t(f(t)/t)]’|.
\[\begin{align}
\bigpair{t^0}{f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n}
&amp; = \bigpair{\left(t\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \\
&amp; = \bigpair{\left(\frac{t}{f(t)}\right)^n + t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \tag{product rule} \\
&amp; = \bigpair{\left(\frac{t}{f(t)}\right)^n}{x^n} + \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n}
\end{align}\]</p>
<p>We proceed from there by cases. In the |n = 0| case, we have
\[
\bigpair{t^0}{x^0} + \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)}{x^0}
= \pair{t^0}{x^0}
= c_0 \]</p>
For the |n &gt; 0| case, to simplify the expressions we’ll show that
\[ \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} = -\bigpair{\left(\frac{t}{f(t)}\right)^n}{x^n} \]
We proceed as follows
\[\begin{align}
\bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n}
&amp; = \bigpair{\left(\frac{f(t)}{t}\right)'\left(\frac{f(t)}{t}\right)^{-n-1}}{tx^n} \\
&amp; = -\frac{1}{n}\bigpair{\left[\left(\frac{f(t)}{t}\right)^{-n}\right]'}{tx^n} \\
&amp; = -\frac{1}{n}\bigpair{\theta_t^*\left[\left(\frac{f(t)}{t}\right)^{-n}\right]}{tx^n} \\
&amp; = -\frac{1}{n}\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{\theta_t tx^n} \\
&amp; = -\frac{1}{n}\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{xDx^n} \\
&amp; = -\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{x^n}
\end{align}\]
|\square|
</details>
<p><span id="TransferFormulaAlt"><strong>Theorem</strong> (Transfer Formula, alternate form):</span> <em>If |\pseq{p_n(x)}{n}| is the
associated sequence of |f|, then
\[ p_n(x) = \frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1} \]
for all |n \geq 1|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
\[\begin{align}
\bigpair{f(t)^k}{\frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1}}
&amp; = \frac{c_n}{nc_{n-1}}\bigpair{\theta_t^*[f(t)^k]}{\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\
&amp; = \frac{kc_n}{nc_{n-1}}\bigpair{f(t)^{k-1}f’(t)}{\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\
&amp; = \frac{kc_n}{nc_{n-1}}\bigpair{f(t)^{k-1}}{f’(t)\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\
&amp; = \frac{k}{n}\bigpair{f(t)^{k-1}}{\frac{c_n}{c_{n-1}}p_{n-1}(x)} \tag{Transfer Formula} \\
&amp; = \frac{k}{n}\pair{f(t)^{k-1}}{f(t)p_n(x)} \tag{theorem (id:cutg)} \\
&amp; = \frac{k}{n}\pair{f(t)^k}{p_n(x)} \\
&amp; = \frac{k}{n}c_n\delta_k^n \\
&amp; = c_n\delta_k^n \\
&amp; = \pair{f(t)^k}{p_n(x)}
\end{align}\]
|\square|
</details>
<p><span id="hcem"><strong>Corollary</strong> (id:hcem):</span> <em>Let |\pseq{p_n(x)}{n}| be associated to |f| and
|\pseq{q_n(x)}{n}| be associated to |gf| with |g| invertible. Then
\[ q_n(x) = \theta_t g(t)^{-n} \theta_t^{-1} p_n(x) \]
where |\theta_t^{-1} x^{n+1} = (c_{n+1}/((n+1)c_n))x^n| and |\theta_t^{-1} 1 = 0|.</em></p>
<p><strong>Proof</strong>: Note that |\theta_t^{-1}\theta_t = 1| and write |q_n(x)| using the <a href="#TransferFormulaAlt">alternate form of the Transfer Formula</a>. |\square|</p>
<p><span id="jyhq"><strong>Theorem</strong> (id:jyhq):</span> <em>Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|, and let |h| and |l| be invertible
series. Then the sequence |r_n(x) = h(t)l(t)^n s_n(x)| is Sheffer for
\[ \left(\frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)}l(t)g(t), l(t)^{-1} f(t) \right) \]</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
Apply |g(t)| to both sides getting
\[ g(t)r_n(x) = h(t)l(t)^n g(t)s_n(x) \]
and noting that |\pseq{g(t)s_n(x)}{n}| is associated to |f| and applying the
<a href="#TransferFormula">Transfer Formula</a>, we get:
\[\begin{align}
g(t)r_n(x)
&amp; = h(t)l(t)^n f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \\
&amp; = h(t)\frac{f’(t)}{l(t)} l(t)^{n+1} \left(\frac{t}{f(t)}\right)^{n+1} x^n \\
&amp; = h(t)\frac{f’(t)}{l(t)} \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\
&amp; = h(t)\frac{f’(t)}{l(t)} \frac{[l(t)^{-1} f(t)]'}{[l(t)^{-1} f(t)]'} \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\
&amp; = \frac{h(t)f’(t)}{[l(t)^{-1} f(t)]'l(t)} [l(t)^{-1} f(t)]' \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\
\end{align} \]
So we get
\[ \frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)} l(t)g(t)r_n(x)
= [l(t)^{-1} f(t)]' \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n
\]
which by the <a href="#TransferFormula">Transfer Formula</a> means these are associated
to |l(t)^{-1}f(t)| and thus |\pseq{r_n(x)}{n}| is Sheffer for
\[ \left(\frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)}l(t)g(t), l(t)^{-1} f(t) \right) \]
|\square|
</details>
<h2 id="umbral-composition-and-transfer-operators">Umbral Composition and Transfer Operators</h2>
<p>Let |\pseq{p_n(x)}{n}| be the associated sequence to |f|. The <strong>transfer</strong> or
<strong>umbral<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a> operator</strong> for |\pseq{p_n(x)}{n}| (or |f|) is the linear operator
|\lambda_f| on |P| defined by \[ \lambda_f x^n = p_n(x) \]
This implies the adjoint operator is
\[ \lambda_f^* g(t) = \sum_{k=0}^\infty \frac{\pair{g(t)}{p_k(x)}}{c_k} t^k \]</p>
<p>The <a href="#TransferFormula">Transfer Formula</a> gives us the action of a transfer operator at
each monomial. Note, this doesn’t imply that a transfer operator is induced by a formal power
series, since the Transfer Formula uses a <em>different</em> formal power series for each monomial.</p>
<p><span id="oony"><strong>Lemma</strong> (id:oony):</span> <em>A |\mathbb K|-algebra homomorphism of |\mathscr F|
is an automorphism if and only if it is preserves degree.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>For the forward direction, assume |T| is an automorphism. Let |f(t) = T^{-1}(t)| with
|\deg f = k| so |f(t) = t^kg(t)| with |g(t)| invertible. |k &gt; 0| since |\mathbb K|-algebra
homomorphisms send constants to themselves but then |T(f(t)) = t = T(t)^k T(g(t))| and the degrees
can only line up if |k = 1| and |\deg T(t) = 1|. |T| thus can’t reduce degree and by the same logic
neither can |T^{-1}|, so |T| must preserve degree.</p>
For the reverse direction, a degree preserving linear operator is continuous. We have that
|\pseq{T(t)^k}{k}| is a pseudobasis, so for any |f \in \mathscr F|, we can
write
\[ f(t) = \sum_{k=0}^\infty a_k T(t)^k = T\left(\sum_{k=0}^\infty a_k t^k\right) \]
so |g(t) = \sum_{k=0}^\infty a_k t^k| satisfies |f = T(g)| and for every |f| we can find
such a |g| making |T| surjective. The uniqueness of the |\pseq{a_k}{k}| implies |T| is injective
and thus bijective. For abstract nonsense reasons this is enough for it to be an automorphism.
|\square|
</details>
<p><span id="lfum"><strong>Lemma</strong> (id:lfum):</span> <em>If |T| is a continuous |\mathbb K|-algebra
homomorphism on |\mathscr F| and |f, g \in \mathscr F| with |\deg f &gt; 0|,
then |T(g(t)) = g(T(t))| and |(Tg)(f(t)) = T(g(f(t)))|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
\[\begin{align}
T(g(t))
&amp; = T\left(\sum_{k=0}^\infty g_k t^k\right) \\
&amp; = \sum_{k=0}^\infty g_k T(t)^k \\
&amp; = g(T(t))
\end{align}\]
and |(Tg)(f(t)) = g(T(f(t))) = T(g(f(t)))|.
|\square|
</details>
<p><span id="stji"><strong>Theorem</strong> (id:stji):</span> <em>A linear operator |\lambda| on |P| is the transfer
operator for |f \in \mathscr F| if and only if its adjoint |\lambda^*| is a
|\mathbb K|-algebra automorphism of |\mathscr F| for which |\lambda^* f(t) = t|.
This makes |\lambda_f(g(t)) = g(\bar f(t))|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>For the forward direction, if |\lambda| is a transfer operator for |f| then we immediately get
\[ \lambda^* f(t)
= \sum_{k=0}^\infty \frac{\pair{f(t)}{p_k(x)}}{c_k} t^k
= \sum_{k=0}^\infty \delta_1^k t^k
= t \] and
\[\begin{align}
\pair{\lambda^*(g(t)h(t))}{x^n}
&amp; = \pair{g(t)h(t)}{\lambda x^n} \\
&amp; = \pair{g(t)h(t)}{p_n(x)} \\
&amp; = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{g(t)}{p_i(x)}\pair{h(t)}{p_j(x)} \tag{theorem (id:jtbs)} \\
&amp; = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{\lambda^* g(t)}{x^i}\pair{\lambda^* h(t)}{x^j} \\
&amp; = \pair{(\lambda^* g(t))(\lambda^* h(t))}{x^n}
\end{align}\]</p>
For the reverse direction,
\[\begin{align}
\pair{f(t)^k}{\lambda x^n}
&amp; = \pair{\lambda^*(f(t)^k)}{x^n} \\
&amp; = \pair{\lambda^*(f(t))^k}{x^n} \\
&amp; = \pair{t^k}{x^n} \\
&amp; = c_n \delta_k^n
\end{align}\]
so |\lambda x^n| has the characteristic property of |p_n(x)|.
|\square|
</details>
<p><span id="chiw"><strong>Corollary</strong> (id:chiw):</span> <em>A continuous |\mathbb K|-algebra automorphism
on |\mathscr F| is the adjoint of a transfer operator.</em></p>
<p><strong>Proof</strong>: By <a href="#fdui">theorem (id:fdui)</a> a continuous linear operator is adjoint to some
linear operator on |P|, and being an automorphism there is some |f \in \mathscr F| that gets
mapped to |t| by the automorphism so the <a href="#stji">previous theorem</a> applies.
|\square|</p>
<p>Summarizing some results of this form: There’s a bijection between continuous linear operators
on |\mathscr F| and linear operators on |P| via adjointness. Further, there’s a bijection between
continuous surjective derivations on |\mathscr F| and umbral shifts, and a bijection between
continuous |\mathbb K|-algebra automorphisms on |\mathscr F| and transfer operators.</p>
<p><span id="ocfq"><strong>Corollary</strong> (id:ocfq):</span> <em>Transfer operators form a group with
|(\lambda_f^*)^{-1} = \lambda_{\bar f}^*| and
|\lambda_f^* \circ \lambda_g^* = \lambda_{g\circ f}^*|.</em></p>
<p><strong>Proof</strong>: This readily follows from |\lambda_f^* g(t) = g(\bar f(t))|. |\square|</p>
<p><span id="ydci"><strong>Theorem</strong> (id:ydci):</span></p>
<ol type="1">
<li><em>An transfer operator maps associated sequences to associated sequences.</em></li>
<li><em>If |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}|
are associated to |f| and |g| respectively, and |\lambda| is a linear operator which
maps |p_n(x)| to |q_n(x)|, then |\lambda^* g(t) = f(t)|.</em></li>
<li><em>Additionally, |\lambda| is a transfer operator.</em></li>
</ol>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>For all of the following let |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| be associated to |f|
and |g| respectively and |\lambda_f|, |\lambda_g| be the transfer operators.</p>
<p>For 1,
\[ \pair{(\lambda_g^*)^{-1}(g(t)^k)}{\lambda_g q_n(x)}
= \pair{\lambda_g(\lambda_g^*)^{-1}(g(t)^k)}{q_n(x)}
= \pair{g(t)^k}{q_n(x)}
= c_n \delta_k^n \]
so |\lambda_g q_n(x)| is associated to |(\lambda_g^*)^{-1}(g(t))|.</p>
<p>For 2,
\[
\pair{\lambda^*(g(t))}{p_n(x)}
= \pair{g(t)}{\lambda p_n(x)}
= \pair{g(t)}{q_n(x)}
= c_n \delta_1^n
= \pair{f(t)}{p_n(x)}
\]</p>
For 3, by <a href="#lfum">lemma (id:lfum)</a>, we can just precompose with |\bar f| to get
|\lambda(g(\bar f(t))) = t| and do the same logic as 2 with |t^k| and |x^n| implying that
|\lambda| is a transfer operator for |g(\bar f(t))|.
|\square|
</details>
<p>Let |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| be two polynomial
sequences. The <strong>umbral composition</strong> of |q| with |p| is written and defined as
\[ \ucomp{q}{p} = \sum_{k=0}^n q_{n,k}p_k(x) \]
If |\lambda| is the transfer operator for |\pseq{p_n(x)}{n}|, then we have
\[ \ucomp{q}{p} = \lambda q_n(x) \]</p>
<p><span id="xvse"><strong>Theorem</strong> (id:xvse):</span> <em>If |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}|
are associated to |f| and |g| respectively, then |\pseq{\ucomp{q}{p}}{n}|
is associated to |g(f(t))|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
\[\begin{align}
\pair{g(f(t))^k}{\ucomp{q}{p}}
&amp; = \pair{g(f(t))^k}{\lambda_f q_n(x)} \\
&amp; = \pair{\lambda_f^*(g(f(t)))^k}{q_n(x)} \\
&amp; = \pair{g(\lambda_f^*(f(t)))^k}{q_n(x)} \\
&amp; = \pair{g(t)^k}{q_n(x)} \\
&amp; = c_n \delta_k^n \\
\end{align}\]
|\square|
</details>
<p><span id="xajh"><strong>Corollary</strong> (id:xajh):</span> <em>Umbral composition makes the set of associated
sequences into a group.</em></p>
<p><strong>Proof</strong>: It follows the group structure of transfer operators. |\square|</p>
<p>A <strong>Sheffer operator</strong> is the linear operator |\mu_{g,f}| defined by
|\mu_{g,f}x^n = s_n(x)| where |\pseq{s_n(x)}{n}| is Sheffer for |(g,f)|.
By considering the associated sequence induced by a Sheffer sequence, i.e. |\pseq{g(t)s_n(x)}{n}|,
we readily get |\mu_{g,f} = g(t)^{-1}\lambda_f| so Sheffer operators can be reduced to transfer
operators.</p>
<p>We can immediately read off the |g| for Bernoulli polynomials using its
<a href="https://en.wikipedia.org/wiki/Bernoulli_polynomials#Representation_by_a_differential_operator">representation by a differential operator</a>.
Namely, |g(t) = \frac{e^t - 1}{t} = \frac{\varepsilon_1(t) - 1}{t}|. This leads to the identity
|\Delta B_n(x) = Dx^n| where |\Delta| is the <a href="https://en.wikipedia.org/wiki/Forward_difference#Basic_types">forward difference operator</a>.</p>
<p><span id="pnci"><strong>Theorem</strong> (id:pnci):</span> <em>Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and let |\pseq{r_n(x)}{n}| be
Sheffer for |(h, l)|. Then |\ucomp{r}{s}| is Sheffer for the pair |(g(t)h(f(t)), l(f(t)))|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
\[\begin{align}
\pair{g(t)h(f(t))l(f(t))^k}{\ucomp{r}{s}}
&amp; = \pair{g(t)h(f(t))l(f(t))^k}{\mu_{g,f} r_n(x)} \\
&amp; = \pair{h(f(t))l(f(t))^k}{g(t)\mu_{g,f} r_n(x)} \\
&amp; = \pair{h(f(t))l(f(t))^k}{\lambda_f r_n(x)} \\
&amp; = \pair{\lambda_f^*(h(f(t))l(f(t))^k)}{r_n(x)} \\
&amp; = \pair{h(\lambda_f^*(f(t)))l(\lambda_f^*(f(t)))^k)}{r_n(x)} \\
&amp; = \pair{h(t)l(t)^k)}{r_n(x)} \\
&amp; = c_n \delta_k^n
\end{align}\]
|\square|
</details>
<p><span id="fony"><strong>Corollary</strong> (id:fony):</span>
\[ \pair{h(t)}{\mu_{g,f} q_n(x)}
= \pair{\mu_{g,f}^*(h(t))}{q_n(x)}
= \pair{g(\bar f(t))^{-1} h(\bar f(t))}{q_n(x)} \]</p>
<p><strong>Proof</strong>: Immediate from definition of |\mu_{g,f}| and <a href="#stji">theorem (id:stji)</a>. |\square|</p>
<p>Given two polynomial sequences |\pseq{r_n(x)}{n}| and |\pseq{s_n(x)}{n}| related by
\[ r_n(x) = \sum_{k=0}^n a_{n,k} s_k(x) \]
the <strong>connection-constants problem</strong> is to determine the constants |a_{n,k}|. When
|\pseq{r_n(x)}{n}| and |\pseq{s_n(x)}{n}| are Sheffer for given formal power series pairs, we
can solve this problem as follows.</p>
<p><span id="hqtd"><strong>Theorem</strong> (id:hqtd):</span> <em>Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and
|\pseq{r_n(x)}{n}| be Sheffer for |(h, l)|. If
\[ r_n(x) = \sum_{k=0}^n a_{n,k} s_k(x) \]
then the sequence |t_n(x) = \sum_{k=0}^n a_{n,k} x^k| is Sheffer for
\[ \left(\frac{h(\bar f(t))}{g(\bar f(t))}, l(\bar f(t)) \right) \]</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
Clearly, |r_n(x) = \ucomp{t}{s}|, so we just apply <a href="#pnci">theorem (id:pnci)</a>
and solve for the |u, v \in \mathscr F| such that |\pseq{t_n(x)}{n}| is Sheffer for |(u, v)|.
|\square|
</details>
<p><span id="ixkb"><strong>Corollary</strong> (id:ixkb):</span> <em>Let |\pseq{p_n(x)}{n}| be associated to |f| and
|\pseq{q_n(x)}{n}| be associated to |l| and
\[ q_n(x) = \sum_{k=0}^n a_{n,k} p_k(x) \]
then |t_n(x) = \sum_{k=0}^n a_{n,k} x^k| is associated to |l(\bar f(t))|.</em></p>
<p><strong>Proof</strong>: Immediate from the previous theorem. |\square|</p>
<p>Transfer operators give us a concise proof of the Lagrange Inversion Formula used for the
<a href="#cjme">compositional inverse</a>. The usual formula would arise from |g(t) = t| in the
below.</p>
<p><span id="LagrangeInversionFormula"><strong>Corollary</strong> (Lagrange Inversion Formula):</span>
<em>Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n &gt; 0|
\[ \pair{g(\bar f(t))}{x^n} = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \]
Of course, |\pair{g(\bar f(t))}{x^0} = \pair{g(t)}{x^0}| since |\deg \bar f = 1|.</em></p>
<details>
<summary>
<strong>Proof</strong>: <span class="expander">(click to expand)</span>
</summary>
<p>From <a href="#stji">theorem (id:stji)</a>, |\lambda_f^*(g(t)) = g(\bar f(t))|.</p>
We conclude with a use of the <a href="#TransferFormulaAlt">alternate form of the Transfer Formula</a>
with |\pseq{p_n(x)}{n}| being associated to |f|.
\[\begin{align}
\pair{g(\bar f(t))}{x^n}
&amp; = \pair{\lambda_f^*(g(t))}{x^n} \\
&amp; = \pair{g(t)}{\lambda_f x^n} \\
&amp; = \pair{g(t)}{p_n(x)} \\
&amp; = \bigpair{g(t)}{\theta_t\left(\frac{t}{f(t)}\right)^n x^{n-1}} \tag{Transfer Formula, alternate form} \\
&amp; = \bigpair{\theta_t^*(g(t))\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \\
&amp; = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}}
\end{align}\]
|\square|
</details>
<p><span id="LagrangeInversionFormulaAlt"><strong>Corollary</strong> (Lagrange Inversion Formula, alternate form):</span>
<em>Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n &gt; 0|
\[ \pair{g(\bar f(t))}{x^n} = \bigpair{g(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \]</em></p>
<p><strong>Proof</strong>: Do the same proof as the <a href="#LagrangeInversionFormula">previous corollary</a>
just with the first form of the <a href="#TransferFormula">Transfer Formula</a>. |\square|</p>
<p><span id="LagrangeInversionFormulaHermite"><strong>Corollary</strong> (Lagrange Inversion Formula, Hermite form):</span>
<em>Let |f, h \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n &gt; 0|
\[ \bigpair{\frac{th(\bar f(t))}{\bar f(t)f’(\bar f(t))}}{x^n}
= \bigpair{h(t)\left(\frac{t}{f(t)}\right)^n}{x^n} \]</em></p>
<p><strong>Proof</strong>: Apply the <a href="#LagrangeInversionFormulaAlt">previous corollary</a> with
|g(t) = h(t)\frac{f(t)}{tf’(t)}|. |\square|</p>
<h2 id="example-chebyshev-polynomials">Example: Chebyshev Polynomials</h2>
<p>While the “classical” umbral calculus generally used |c_n = n!|, one interesting (orthogonal)
polynomial sequence that benefits from the extra flexibility is Chebyshev polynomials where
we’ll use |c_n = (-1)^n|. (This can be viewed as a special case of Gegenbauer polynomials which
use |c_n = {-\lambda \choose n}^{-1}| which reduces to the Chebyshev case when |\lambda = 1|.)</p>
<p>The book “The Umbral Calculus” mentioned in the <a href="#introduction">introduction</a> primarily
covers the “classical” case and has many examples of those. It also covers the “non-classical” case
as well albeit as a bit of a second thought.</p>
<p>|tx^n = -x^{n-1}| so |tp(x) = -x^{-1} p(x)|</p>
<p>|\pseq{T_n(x)}{n}| is Sheffer for |(g, f)| where |g(t) = (1-t^2)^{-2}|
and |f(t) = \frac{\sqrt{1-t^2} - 1}{t} = \frac{-t}{1 + \sqrt{1 - t^2}}|.
|\bar f(t) = \frac{-2t}{1+t^2}| and
|f’(t) = \frac{-1}{\sqrt{1-t^2}(1 + \sqrt{1 - t^2})} = \frac{f(t)}{t\sqrt{1-t^2}}|.</p>
<p>|f(t)s_n(x) = \frac{c_n}{c_{n-1}}s_n(x)| from <a href="#hvdt">theorem (id:hvdt)</a> gives the
recurrence
\[\frac{c_n}{c_{n+1}ts_{n+1}(x) + \frac{c_n}{c_{n-1}}ts_{n-1}(x)} + 2s_n(x) = 0 \]
for any Sheffer sequence with |f(t)| as its delta series. For the Chebyshev polynomials, this
simplifies to \[ 2xT_n(x) + T_{n+1}(x) + T_{n-1}(x) = 0 \]</p>
<p>|\theta_t x^n = -(n+1)x^{n+1}| which we can compute from |\theta_t t = xD| which takes the form
|\theta_t t x^n = -\theta_t x^{n-1} = nx^n|. This leads to |\theta_t = -x(1 + xD)|.</p>
<p>From <a href="#vxhh">theorem (id:vxhh)</a>, we get |TT_n(x) = nT_n(x)| where</p>
<p>\[ T = \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{f(t)}{f’(t)}
= \left(xD - \frac{tg’(t)}{g(t)}\right)\frac{f(t)}{tf’(t)} \]</p>
<p>This leads to |nT_n(x) = (xD - (1-t^2)^{-1})\sqrt{1-t^2}T_n(x)|.</p>
<h2 id="summary">Summary</h2>
<p>A formal power series |f| is <strong>invertible</strong> iff |\deg f = 0| and a <strong>delta series</strong> iff
|\deg f = 1|. We call a linear functional/operator induced by an invertible/delta series an
<strong>invertible</strong>/<strong>delta</strong> <strong>functional</strong>/<strong>operator</strong>.</p>
<p>The defining property of |\pseq{s_n(x)}{n}| being <strong>Sheffer</strong> for |(g, f)| where |g| is an
invertible series and |f| is a delta series is
\[ \pair{g(t)f(t)^k}{s_n(x)} = \pair{t^k}{x^n} = c_n \delta_k^n \]</p>
<p>If |g(t) = 1|, then we say |\pseq{s_n(x)}{n}| is the <strong>associated sequence</strong> for |f|,
and usually we’ll use |\pseq{p_n(x)}{n}| instead.</p>
<p>If |f(t) = t|, then we say |\pseq{s_n(x)}{n}| is the <strong>Appell sequence</strong> for |g|.</p>
<p>Formal power series as operators
\[ t^k x^n = \frac{c_n}{c_{n-k}} x^{n-k} \]
and generally,
\[h(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k}} h_k x^{n-k} \]</p>
<p>We can generalize this to Sheffer sequences a la
\[h(f(t)) s_n(x) = \sum_{k=0}^n \frac{c_n}{c_{n-k}} h_k s_{n-k}(x) \]
where |(g, f)| is Sheffer for |\pseq{s_n(x)}{n}| for some invertible |g \in \mathscr F|.</p>
<p>A linear operator |T| on |\mathscr F| is <strong>continuous</strong> if given
|\pseq{f_k}{k}| such that |\deg f_k \to \infty| as |k \to \infty|, we have
|\deg T(f_k) \to \infty| as |k \to \infty|.</p>
<p>Given a linear operator |\mu| on |P|, the <strong>adjoint</strong> |\mu^*| is a linear operator on
|\mathscr F| characterized by:
\[ \pair{\mu^* f(t)}{p(x)} = \pair{f(t)}{\mu p(x)} \]</p>
<p>Its <strong>adjoint expansion</strong> is:
\[ \mu^* f(t) = \sum_{k=0}^\infty \frac{\pair{f(t)}{\mu x^k}}{c_k} t^k \]</p>
<p>This applies generally, but, in particular for umbral shifts and transfer operators.</p>
<p><a href="#ahdu"><strong>Theorem</strong> (id:ahdu):</a> <em>The adjoint to a linear operator on |P| is continuous.</em></p>
<p><a href="#fdui"><strong>Theorem</strong> (id:fdui):</a> <em>If |T| is a continuous linear operator on
|\mathscr F|, then there exists a linear operator |\mu| on |P| such that |T = \mu^*|.</em></p>
<p><a href="#GeneratingFunction"><strong>Theorem</strong> (Generating Function)</a>: |\pseq{s_n(x)}{n}| is
Sheffer for |(g, f)| if and only if
\[ \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t))
= \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \]
for all |y \in \mathbb K|.*</p>
<p><a href="#ShefferIdentity"><strong>Theorem</strong> (Sheffer Identity):</a> <em>A sequence |\pseq{s_n(x)}{n}| is Sheffer for
|(g, f)| for some invertible |g| if and only if
\[ \varepsilon_y(t) s_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) s_j(x) \]
for all |y \in \mathbb K| where |\pseq{p_n(x)}{n}| is associated to |f|.</em></p>
<p><a href="#cutg"><strong>Theorem</strong> (id:cutg):</a> <em>A sequence |\pseq{p_n(x)}{n}| is the associated sequence for
|f| if and only if 1) |\pair{t^0}{p_n(x)} = c_0 \delta_n^0| for all |n \in \mathbb N|,
and 2) |f(t) p_n(x) = \frac{c_n}{c_{n-1}}p_{n-1}(x)| for all |n \in \mathbb N_+|.</em></p>
<p><a href="#TransferFormula"><strong>Theorem</strong> (Transfer Formula):</a> <em>If |\pseq{p_n(x)}{n}| is the
associated sequence of |f|, then
\[ p_n(x) = f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \]
for all |n \in \mathbb N|.</em></p>
<p><a href="#TransferFormulaAlt"><strong>Theorem</strong> (Transfer Formula, alternate form):</a> <em>If
|\pseq{p_n(x)}{n}| is the associated sequence of |f|, then
\[ p_n(x) = \frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1} \]
for all |n \geq 1|.</em></p>
<p><a href="#qqes"><strong>Theorem</strong> (id:qqes):</a> <em>|\pseq{s_n(x)}{n}| is Sheffer for |(g, f)|
if and only if |\pseq{g(t)s_n(x)}{n}| is the associated sequence for |f|.</em></p>
<p><a href="#hvdt"><strong>Theorem</strong> (id:hvdt):</a> <em>A sequence |\pseq{s_n(x)}{n}| is Sheffer for
|(g, f)| for some invertible |g| if and only if |f(t) s_n(x) = \frac{c_n}{c_{n-1}}s_{n-1}(x)|.</em></p>
<p><a href="#omlq"><strong>Theorem</strong> (id:omlq):</a> <em>Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|.
Then if |\theta_f| is the umbral shift for |f|,</em>
\[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]</p>
<p><a href="#ExpansionTheorem"><strong>Theorem</strong> (Expansion Theorem):</a> <em>Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|.
Then for any |h \in \mathscr F|,
\[ h(t) = \sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k \]</em></p>
<p><a href="#PolynomialExpansionTheorem"><strong>Corollary</strong> (Polynomial Expansion Theorem):</a> <em>Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|.
Then for a |p \in P|,
\[ p(x) = \sum_{n=0}^\infty \frac{\pair{g(t)f(t)^n}{p(x)}}{c_n} s_n(x) \]</em></p>
<p><a href="#stji"><strong>Theorem</strong> (id:stji):</a> <em>A linear operator |\lambda| on |P| is the transfer
operator for |f \in \mathscr F| if and only if its adjoint |\lambda^*| is a
|\mathbb K|-algebra automorphism of |\mathscr F| for which |\lambda^* f(t) = t|.
This makes |\lambda_f(g(t)) = g(\bar f(t))|.</em></p>
<p><a href="#pnci"><strong>Theorem</strong> (id:pnci):</a> <em>Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and let |\pseq{r_n(x)}{n}| be
Sheffer for |(h, l)|. Then |\ucomp{r}{s}| is Sheffer for the pair |(g(t)h(f(t)), l(f(t)))|.</em></p>
<p><a href="#fony"><strong>Corollary</strong> (id:fony):</a>
\[ \pair{h(t)}{\mu_{g,f} q_n(x)}
= \pair{\mu_{g,f}^*(h(t))}{q_n(x)}
= \pair{g(\bar f(t))^{-1} h(\bar f(t))}{q_n(x)} \]</p>
<p><a href="#kscz"><strong>Theorem</strong> (id:kscz):</a> <em>If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)|, then
\[ \theta_t s_n(x)
= \sum_{k=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}} \pair{g’(t)}{s_{n-k}(x)}
+ \frac{kc_n}{c_k c_{n-k+1}} \pair{g(t)f’(t)}{s_{n-k+1}(x)} \right] s_k(x) \]</em></p>
<p><a href="#tvdx"><strong>Corollary</strong> (id:tvdx):</a>
\[ ts_n(x) = \sum_{k=0}^{n-1} \frac{c_n}{c_k c_{n-k}} \pair{t}{p_{n-k}(x)} s_k(x) \]</p>
<p><a href="#ConjugateRepresentation"><strong>Theorem</strong> (Conjugate Representation):</a> <em>|\pseq{s_n(x)}{n}| is Sheffer for
|(g, f)| if and only if
\[ s_n(x) = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} x^k \]</em></p>
<p><a href="#MultiplicationTheorem"><strong>Theorem</strong> (Multiplication Theorem):</a>
<em>Let |\pseq{s_n(x)}{n}| be Appell for |g|, then
\[ s_n(\alpha x) = \alpha^n \frac{g(t)}{g(t/\alpha)} s_n(x) \]
for |\alpha\neq 0|.</em></p>
<p><a href="#LagrangeInversionFormula"><strong>Corollary</strong> (Lagrange Inversion Formula):</a>
<em>Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n &gt; 0|
\[ \pair{g(\bar f(t))}{x^n} = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \]
Of course, |\pair{g(\bar f(t))}{x^0} = \pair{g(t)}{x^0}| since |\deg \bar f = 1|.</em></p>
<p><a href="#LagrangeInversionFormulaAlt"><strong>Corollary</strong> (Lagrange Inversion Formula, alternate form):</a>
<em>Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n &gt; 0|
\[ \pair{g(\bar f(t))}{x^n} = \bigpair{g(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \]</em></p>
<p><a href="#LagrangeInversionFormulaHermite"><strong>Corollary</strong> (Lagrange Inversion Formula, Hermite form):</a>
<em>Let |f, h \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n &gt; 0|
\[ \bigpair{\frac{th(\bar f(t))}{\bar f(t)f’(\bar f(t))}}{x^n}
= \bigpair{h(t)\left(\frac{t}{f(t)}\right)^n}{x^n} \]</em></p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>This isn’t a basis because we’re allowing countably infinite sums of the elements
of the pseudobasis, whereas for a basis, even with infinite elements, we’d be claiming that every
element is a <em>finite</em> sum of the basis elements.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>As far as I can tell, “umbral operator” is used in the “classical” |c_n = n!| case
while “transfer operator” is more general. I’m sure people use “umbral operator” generally too,
though.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>2025-11-09 16:36:38-08:00</pubDate>
    <guid>https://www.hedonisticlearning.com/posts/umbral-calculus.html</guid>
    <dc:creator>Derek Elkins</dc:creator>
</item>
<item>
    <title>Arithmetic Functions</title>
    <link>https://www.hedonisticlearning.com/posts/arithmetic-functions.html</link>
    <description><![CDATA[<h2 id="introduction">Introduction</h2>
<p>I want to talk about one of the many pretty areas of number theory. This involves the notion of an
arithmetic function and related concepts. A few relatively simple concepts will allow us to produce
a variety of useful functions and theorems. This provides only a glimpse of the start of the
field of analytic number theory, though many of these techniques are used in other places as we’ll
also start to see.</p>
<p>(See <a href="#summary">the end for a summary of identities and results</a>.)</p>
<ul>
<li><a href="#prelude">Prelude</a></li>
<li><a href="#arithmetic-functions">Arithmetic Functions</a></li>
<li><a href="#dirichlet-series">Dirichlet Series</a></li>
<li><a href="#dirichlet-convolution">Dirichlet Convolution</a></li>
<li><a href="#euler-product-formula">Euler Product Formula</a></li>
<li><a href="#möbius-inversion">Möbius Inversion</a>
<ul>
<li><a href="#lambert-series">Lambert Series</a></li>
<li><a href="#inclusion-exclusion">Inclusion-Exclusion</a></li>
<li><a href="#varphi">|\varphi|</a></li>
</ul></li>
<li><a href="#combinatorial-species">Combinatorial Species</a></li>
<li><a href="#derivative-of-dirichlet-series">Derivative of Dirichlet series</a></li>
<li><a href="#dirichlet-inverse">Dirichlet Inverse</a></li>
<li><a href="#more-examples">More Examples</a>
<ul>
<li><a href="#lambda-and-gamma">|\lambda| and |\gamma|</a></li>
<li><a href="#indicator-functions">Indicator Functions</a></li>
</ul></li>
<li><a href="#summatory-functions">Summatory Functions</a>
<ul>
<li><a href="#mellin-transform">Mellin Transform</a></li>
</ul></li>
<li><a href="#summary">Summary</a>
<ul>
<li><a href="#properties">Properties</a>
<ul>
<li><a href="#dirichlet-convolution-1">Dirichlet Convolution</a></li>
<li><a href="#dirichlet-inverse-1">Dirichlet Inverse</a></li>
<li><a href="#convolution-theorem">Convolution Theorem</a></li>
<li><a href="#möbius-inversion-1">Möbius Inversion</a></li>
<li><a href="#dirichlet-series-1">Dirichlet Series</a></li>
<li><a href="#euler-product-formula-1">Euler Product Formula</a></li>
<li><a href="#lambert-series-1">Lambert Series</a></li>
</ul></li>
<li><a href="#arithmetic-function-definitions">Arithmetic function definitions</a></li>
<li><a href="#dirichlet-convolutions">Dirichlet convolutions</a></li>
<li><a href="#dirichlet-series-2">Dirichlet series</a></li>
</ul></li>
</ul>
<h2 id="prelude">Prelude</h2>
<p>As some notation, I’ll write |\mathbb N_+| for the set of positive naturals, and |\mathbb P| for
the set of primes. |\mathbb N| will contain |0|. Slightly atypically, I’ll write |[n]| for the set
of numbers from |1| to |n| inclusive, i.e. |a \in [n]| if and only if |1 \leq a \leq n|.</p>
<p>I find that the easiest way to see results in number theory
is to view a positive natural number as a multiset of primes which is uniquely given by
factorization. Coprime numbers are ones where these multisets are disjoint. Multiplication unions
the multisets. The greatest common divisor is multiset intersection. |n| divides |m| if and only
if |n| corresponds to a sub-multiset of |m|, in which case |m/n| corresponds to the multiset
difference. The <strong>multiplicity</strong> of an element of a multiset is the number of occurrences.
For a multiset |P|, |\mathrm{dom}(P)| is the <em>set</em> of elements of the multiset |P|, i.e. those
with multiplicity greater than |0|. For a finite multiset |P|, |\vert P\vert| will be the sum of
the multiplicities of the distinct elements, i.e. the number of elements (with duplicates) in the
multiset.</p>
<p>We can represent a multiset of primes as a function |\mathbb P \to \mathbb N| which maps an
element to its multiplicity. A <em>finite</em> multiset would then be such a function that is |0| at all
but finitely many primes. Alternatively, we can represent the multiset as a <em>partial</em> function
|\mathbb P \rightharpoonup \mathbb N_+|. It will be finite when it is defined for only finitely
many primes. Equivalently, when it is a finite subset of |\mathbb P\times\mathbb N_+| (which is
also a functional relation).</p>
<p>Unique factorization provides a bijection between finite multisets of primes and positive natural
numbers. Given a finite multiset |P|, the corresponding positive natural number is
|n_P = \prod_{(p, k) \in P} p^k|.</p>
<p>I will refer to this view often in the following.</p>
<h2 id="arithmetic-functions">Arithmetic Functions</h2>
<p>An <a href="https://en.wikipedia.org/wiki/Arithmetic_function"><strong>arithmetic function</strong></a> is just a function
defined on the positive naturals. Usually, they’ll land in (not necessarily positive) natural
numbers, but that isn’t required.</p>
<p>In most cases, we’ll be interested in the specific subclass of multiplicative arithmetic functions.
An arithmetic function, |f|, is <strong>multiplicative</strong> if |f(1) = 1| and |f(ab) = f(a)f(b)| whenever
|a| and |b| are coprime. We also have the notion of a <strong>completely multiplicative</strong> arithmetic
function for which |f(ab) = f(a)f(b)| always. Obviously, completely multiplicative functions are
multiplicative. Analogously, we also have a notion of (<strong>completely</strong>) <strong>additive</strong> where
|f(ab) = f(a) + f(b)|. <em>Warning</em>: In other mathematical contexts, “additive” means |f(a+b)=f(a)+f(b)|.
An obvious example of a completely additive function being the logarithm. Exponentiating an additive
function will produce a multiplicative function.</p>
<p>For an additive function, |f|, we automatically get |f(1) = 0| since |f(1) = f(1\cdot 1) = f(1) + f(1)|.</p>
<p><strong>Lemma</strong>: The product of two multiplicative functions |f| and |g| is multiplicative.<br/>
<strong>Proof</strong>: For |a| and |b| coprime, |f(ab)g(ab) = f(a)f(b)g(a)g(b) = f(a)g(a)f(b)g(b)|. |\square|</p>
<p>A parallel statement holds for completely multiplicative functions.</p>
<p>It’s also clear that a completely multiplicative function is entirely determined by its action on
prime numbers. Since |p^n| is coprime to |q^n| whenever |p| and |q| are coprime, we see
that a multiplicative function is entirely determined by its action on powers of primes. To this
end, I’ll often define multiplicative/additive functions by their action on prime powers and
completely multiplicative/additive functions by their action on primes.</p>
<p>Multiplicative functions aren’t closed under composition, but we do have that if |f| is <em>completely</em>
multiplicative and |g| is multiplicative, then |f \circ g| is multiplicative when that composite
makes sense.</p>
<p>Here are some examples. Not all of these will be used in the sequel.</p>
<ul>
<li>The power function |({-})^z| for any |z|, not necessarily an integer, is completely multiplicative.</li>
<li>Choosing |z=0| in the previous, we see the constantly one function |\bar 1(n) = 1| is completely
multiplicative.</li>
<li>The identity function is clearly completely multiplicative and is also the |z=1| case of the above.</li>
<li>The unit function, i.e. the indicator function for |1|, is |\varepsilon(n) = \begin{cases}1, &amp; n = 1 \\ 0, &amp; n \neq 1\end{cases}|
and is completely multiplicative.</li>
<li>Define a multiplicative function via |\mu(p^n) = \begin{cases} -1, &amp; n = 1 \\ 0, &amp; n &gt; 1\end{cases}|
where |p| is prime. This is the <a href="https://en.wikipedia.org/wiki/M%C3%B6bius_function">Möbius function</a>.
More holistically, |\mu(n)| is |0| if |n| has any square factors, otherwise |\mu(n) = (-1)^k|
where |k| is the number of (distinct) prime factors.</li>
<li>Define a completely multiplicative function via |\lambda(p) = -1|. |\lambda(n) = \pm 1|
depending on whether there is an even or odd number of prime factors (including duplicates).
This function is known as the <a href="https://en.wikipedia.org/wiki/Liouville_function">Liouville function</a>.</li>
<li>|\lambda(n) = (-1)^{\Omega(n)}| where |\Omega(n)| is the completely additive function which
counts the number of prime factors of |n| including duplicates. |\Omega(n_P) = \vert P\vert|.</li>
<li>Define a multiplicative function via |\gamma(p^n) = -1|. |\gamma(n) = \pm 1| depending on
whether there is an even or odd number of <em>distinct</em> prime factors.</li>
<li>|\gamma(n) = (-1)^{\omega(n)}| where |\omega(n)| is the additive function which counts the
number of <em>distinct</em> prime factors of |n|. See <a href="https://en.wikipedia.org/wiki/Prime_omega_function">Prime omega function</a>.
We also see that |\omega(n_P) = \vert\mathrm{dom}(P)\vert|.</li>
<li>The completely additive function for |q\in\mathbb P|, |\nu_q(p) = \begin{cases}1,&amp;p=q\\0,&amp;p\neq q\end{cases}|
is the <a href="https://en.wikipedia.org/wiki/P-adic_valuation">p-adic valuation</a>.</li>
<li>It follows that the |p|-adic absolute value |\vert r\vert_p = p^{-\nu_p(r)}| is completely
multiplicative. It can be characterized on naturals by
|\vert p\vert_q = \begin{cases}p^{-1},&amp;p=q\\1,&amp;p\neq q\end{cases}|.</li>
<li>|\gcd({-}, k)| for a fixed |k| is multiplicative. Given any multiplicative function |f|,
|f \circ \gcd({-},k)| is multiplicative. This essentially “restricts” |f| to only see the prime
powers that divide |k|. Viewing the finite multiset of primes |P| as a function |\mathbb P\to\mathbb N|,
|f(\gcd(p^n,n_P)) = \begin{cases}f(p^n),&amp;n\leq P(p)\\f(p^{P(p)}),&amp;n&gt;P(p)\end{cases}|.</li>
<li>The multiplicative function characterized by |a(p^n) = p(n)| where |p(n)| is the <a href="https://en.wikipedia.org/wiki/Partition_function_(number_theory)">partition function</a>
counts the number of abelian groups the given order. That this function is multiplicative is a
consequence of the fundamental theorem of finite abelian groups.</li>
<li>The <a href="https://en.wikipedia.org/wiki/Jacobi_symbol">Jacobi symbol</a> |\left(\frac{a}{n}\right)|
where |a\in\mathbb Z| and |n| is an <em>odd</em> positive integer is a completely multiplicative
function with either |a| or |n| fixed. When |n| is an odd prime, it reduces to the
<a href="https://en.wikipedia.org/wiki/Legendre_symbol">Legendre symbol</a>. For |p| an odd prime, we have
|(\frac{a}{p}) = a^{\frac{p-1}{2}} \pmod p|. This will always be in
|\{-1, 0, 1\}| and can be alternately defined as
|\left(\frac{a}{p}\right) = \begin{cases}0,&amp;p\mid a\\1,&amp;p\nmid a\text{ and }\exists x.x^2\equiv a\pmod p\\-1,&amp;\not\exists x.x^2\equiv a\pmod p\end{cases}|.
Therefore, |\left(\frac{a}{p}\right)=1| (|=0|) when |a| is a (trivial) <a href="https://en.wikipedia.org/wiki/Quadratic_residue">quadratic residue</a>
mod |p|.</li>
<li>An interesting example which is not multiplicative nor additive is the <a href="https://en.wikipedia.org/wiki/Arithmetic_derivative">arithmetic derivative</a>.
Let |p\in\mathbb P|. Define |\frac{\partial}{\partial p}(n)| via |\frac{\partial}{\partial p}(p) = 1|,
|\frac{\partial}{\partial p}(q) = 0| for |q\neq p| and |q\in\mathbb P|, and
|\frac{\partial}{\partial p}(nm) = \frac{\partial}{\partial p}(n)m + n\frac{\partial}{\partial p}(m)|.
We then have |D_S = \sum_{p\in S}\frac{\partial}{\partial p}| for non-empty |S\subseteq\mathbb P|
which satisfies the same product rule identity. This perspective views a natural number (or, more
generally, a rational number) as a monomial in infinitely many variables labeled by prime numbers.</li>
<li>A <strong><a href="https://en.wikipedia.org/wiki/Dirichlet_character">Dirichlet character</a> of modulus |m|</strong> is,
by definition, a completely multiplicative function |\chi| satisfying |\chi(n + m) = \chi(n)|
and |\chi(n)| is non-zero if and only if |n| is coprime to |m|. The Jacobi symbol
|\left(\frac{({-})}{m}\right)| is a Dirichlet character of modulus |m|. |\bar 1| is the
Dirichlet character of modulus |1|.</li>
</ul>
<h2 id="dirichlet-series">Dirichlet Series</h2>
<p>Given an arithmetic function |f|, we define the <a href="https://en.wikipedia.org/wiki/Dirichlet_series"><strong>Dirichlet series</strong></a>:</p>
<p>\[\mathcal D[f](s) = \sum_{n=1}^\infty \frac{f(n)}{n^s} = \sum_{n=1}^\infty f(n)n^{-s}\]</p>
<p>When |f| is a Dirichlet character, |\chi|, this is referred to as the <a href="https://en.wikipedia.org/wiki/Dirichlet_L-function">(Dirichlet) |L|-series</a>
of the character, and the analytic continuation is the (Dirichlet) |L|-function and is written
|L(s, \chi)|.</p>
<p>We’ll not focus much on when such a series converges. See <a href="https://en.wikipedia.org/wiki/Dirichlet_series#Analytic_properties">this section</a>
of the above Wikipedia article for more details. Alternatively, we could talk about
<a href="https://en.wikipedia.org/wiki/Dirichlet_series#Formal_Dirichlet_series">formal Dirichlet series</a>.
We can clearly see that if |s = 0|, then we get
the sum |\sum_{n=1}^\infty f(n)| which clearly won’t converge for, say, |f = \bar 1|. We can say
that if |f| is asymptotically bounded by |n^k| for some |k|, i.e. |f \in O(n^k)|, then the series
will converge absolutely when the real part of |s| is greater than |k+1|. For |\bar 1|, it follows
that |\mathcal D[\bar 1](x + iy)| is defined when |x &gt; 1|. We can use <a href="https://en.wikipedia.org/wiki/Analytic_continuation">analytic continuation</a>
to go beyond these limits.</p>
<p>See <a href="https://projecteuclid.org/journals/missouri-journal-of-mathematical-sciences/volume-20/issue-1/A-Catalog-of-Interesting-Dirichlet-Series/10.35834/mjms/1316032830.full">A Catalog of Interesting Dirichlet Series</a>
for a more reference-like listing. Beware differences in notation.</p>
<h2 id="dirichlet-convolution">Dirichlet Convolution</h2>
<p>Why is this interesting in this context? Let’s consider two arithmetic functions |f| and |g| and
multiply their corresponding Dirichlet series. We’ll get:</p>
<p>\[\mathcal D[f](s)\mathcal D[g](s) = \sum_{n=1}^\infty h(n)n^{-s} = \mathcal D[h](s)\]</p>
<p>where now we need to figure out what |h(n)| is. But |h(n)| is going to be the sum of all the terms
of the form |f(a)a^{-s}g(b)b^{-s} = f(a)g(b)(ab)^{-s}| where |ab = n|. We can thus write:
\[h(n) = \sum_{ab=n} f(a)g(b) = \sum_{d\mid n} f(d)g(n/d)\] We’ll write this more compactly as
|h = f \star g| which we’ll call <a href="https://en.wikipedia.org/wiki/Dirichlet_convolution"><strong>Dirichlet convolution</strong></a>.
We have thus shown a <strong>convolution theorem</strong> of the form
\[\mathcal D[f]\mathcal D[g] = \mathcal D[f \star g]\]</p>
<p>The unit function serves as a unit to this operation which is reflected by
|\mathcal D[\varepsilon](s) = 1|.</p>
<p>In the same way we can view a sum of the form |\sum_{a+b=n}f(a)g(b)| that arises in “normal”
convolution as a sum along the line |y = n - x|, we can view the sum |\sum_{ab=n}f(a)g(b)| as
a sum along a hyperbola of the form |y = n/x|. For all of
|\sum_{n=1}^\infty\sum_{k=1}^\infty f(n)g(k)|, |\sum_{n=1}^\infty\sum_{k=1}^n f(k)g(n-k)|,
and |\sum_{n=1}^\infty\sum_{k\mid n}f(k)g(n/k)| we’re including |f(a)g(b)| for every
|(a,b)\in\mathbb N_+\times\mathbb N_+| in the sum exactly once. The difference is whether
we’re grouping the internal sum by rows, diagonals, or hyperbolas. This idea of summing hyperbolas
can be expanded to a computational technique for sums of multiplicative functions called the
<a href="https://en.wikipedia.org/wiki/Dirichlet_hyperbola_method">Dirichlet hyperbola method</a>.</p>
<p>Since we will primarily be interested in multiplicative functions, we should check that
|f \star g| is a multiplicative function when |f| and |g| are.</p>
<p><strong>Lemma</strong>: Assume |a| and |b| are coprime, and |f| and |g| are multiplicative. Then
|(f \star g)(ab) = (f \star g)(a)(f \star g)(b)|.</p>
<p><strong>Proof</strong>: Since |a| and |b| are coprime, they share <em>no</em> divisors besides |1|. This means every |d|
such that |d \mid ab| factors as |d = d_a d_b| where |d_a \mid a| and |d_b \mid b|. More
strongly, write |D_n = \{ d \in \mathbb N_+ \mid d \mid n\}|, then for any coprime pair of
numbers |i| and |j|, we have |D_{ij} \cong D_i \times D_j| and that every pair
|(d_i, d_j) \in D_i \times D_j| are coprime<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. Thus,</p>
<p>\[\begin{flalign}
(f \star g)(ab)
&amp; = \sum_{d \in D_{ab}} f(d)g((ab)/d) \tag{by definition} \\
&amp; = \sum_{(d_a, d_b) \in D_a \times D_b} f(d_a d_b)g((ab)/(d_a d_b)) \tag{via the bijection} \\
&amp; = \sum_{(d_a, d_b) \in D_a \times D_b} f(d_a)f(d_b)g(a/d_a)g(b/d_b) \tag{f and g are multiplicative} \\
&amp; = \sum_{d_a \in D_a} \sum_{d_b \in D_b} f(d_a)f(d_b)g(a/d_a)g(b/d_b) \tag{sum over a Cartesian product} \\
&amp; = \sum_{d_a \in D_a} f(d_a)g(a/d_a) \sum_{d_b \in D_b} f(d_b)g(b/d_b) \tag{undistributing} \\
&amp; = \sum_{d_a \in D_a} f(d_a)g(a/d_a) (f \star g)(b) \tag{by definition} \\
&amp; = (f \star g)(b) \sum_{d_a \in D_a} f(d_a)g(a/d_a) \tag{undistributing} \\
&amp; = (f \star g)(b) (f \star g)(a) \tag{by definition} \\
&amp; = (f \star g)(a) (f \star g)(b) \tag{commutativity of multiplication}
\end{flalign}\] |\square|</p>
<p>It is <em>not</em> the case that the Dirichlet convolution of two <em>completely</em> multiplicative functions is
<em>completely</em> multiplicative.</p>
<p>We can already start to do some interesting things with this. First, we see that
|\mathcal D[\bar 1] = \zeta|, the <a href="https://en.wikipedia.org/wiki/Riemann_zeta_function"><strong>Riemann zeta function</strong></a>.
Now consider |(\bar 1 \star \bar 1)(n) = \sum_{k \mid n} 1 = d(n)|.
|d(n)| is the <a href="https://en.wikipedia.org/wiki/Divisor_function">divisor function</a>
which counts the number of divisors of |n|. We see that
|\mathcal D[d](s) = \zeta(s)^2|. A simple but useful fact is
|\zeta(s - z) = \mathcal D[(-)^z](s)|. This directly generalizes the result for
|\mathcal D[\bar 1]| and also implies |\mathcal D[\operatorname{id}](s) = \zeta(s - 1)|.</p>
<p>Generalizing in a different way, we get the family of functions
|\sigma_k = ({-})^k \star \bar 1|. |\sigma_k(n) = \sum_{d \mid n} d^k|.
From the above, we see |\mathcal D[\sigma_k](s) = \zeta(s - k)\zeta(s)|.</p>
<p><strong>Lemma</strong>: Given a completely multiplicative function |f|,
we get |f(n)(g \star h)(n) = (fg \star fh)(n)|.<br/>
<strong>Proof</strong>: \[\begin{flalign}
(fg \star fh)(n)
&amp; = \sum_{d \mid n} f(d)g(d)f(n/d)h(n/d) \\
&amp; = \sum_{d \mid n} f(d)f(n/d)g(d)h(n/d) \\
&amp; = \sum_{d \mid n} f(n)g(d)h(n/d) \\
&amp; = f(n)\sum_{d \mid n} g(d)h(n/d) \\
&amp; = f(n)(g \star h)(n)
\end{flalign}\]
|\square|</p>
<p>As a simple corollary, for a completely multiplicative |f|,
|f \star f = f(\bar 1 \star \bar 1) = fd|.</p>
<h2 id="euler-product-formula">Euler Product Formula</h2>
<p>However, the true power of this is unlocked by the following theorem:</p>
<p><strong>Theorem</strong> (<a href="https://en.wikipedia.org/wiki/Euler_product"><strong>Euler product formula</strong></a>):
Given a multiplicative function |f| which doesn’t grow too fast, e.g. is |O(n^k)| for some |k &gt; 0|,
\[\mathcal D[f](s)
= \sum_{n=1}^\infty f(n)n^{-s}
= \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-ns}
= \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty f(p^n)p^{-ns}\right)
\]
where the series converges.</p>
<p><strong>Proof</strong>: The last equality is simply using the fact that |f(p^0)p^0 = f(1) = 1| because |f| is
multiplicative. The idea for the main part is similar to how we derived Dirichlet convolution.
When we start to distribute out the infinite product, each term will correspond to the product of
selections of a term from each series. When all but finitely many of those selections select the |1|
term, we get |\prod_{(p, k) \in P}f(p^k)(p^k)^{-s}| where |P| is some finite multiset of
primes induced by those selections. Therefore,
|\prod_{(p, k) \in P}f(p^k)(p^k)^{-s} = f(n_P)n_P^{-s}|. Thus, by unique factorization,
|f(n)n^{-s}| for every positive natural occurs in the sum produced by distributing the right-hand
side exactly once.</p>
<p>In the case where |P| is not a <em>finite</em> multiset, we’ll have \[
\frac{\prod_{(p, k) \in P}f(p^k)}{\left(\prod_{(p, k) \in P}p^k\right)^s}\]</p>
<p>The denominator of this expression goes to infinity when the real part of |s| is greater than |0|.
As long as the numerator doesn’t grow faster than the denominator (perhaps after restricting the
real part of |s| to be greater than some bound), then this product goes to |0|. Therefore, the only
terms that remain are these corresponding to the Dirichlet series on the left-hand side. |\square|</p>
<p>If we assume |f| is <em>completely</em> multiplicative, we can further simplify Euler’s product formula
via the usual sum of a geometric series, |\sum_{n=0}^\infty x^n = (1-x)^{-1}|, to:</p>
<p>\[
\sum_{n=1}^\infty f(n)n^{-s}
= \prod_{p \in \mathbb P}\sum_{n=0}^\infty (f(p)p^{-s})^n
= \prod_{p \in \mathbb P}(1 - f(p)p^{-s})^{-1}
\]</p>
<p>Now let’s put this to work. The first thing we can see is
|\zeta(s) = \mathcal D[\bar 1](s) = \prod_{p\in\mathbb P}(1 - p^{-s})^{-1}|. But this lets
us write |1/\zeta(s) = \prod_{p\in\mathbb P}(1 - p^{-s})|.
If we look for a multiplicative function that would produce the right-hand side, we see that it must
send a prime |p| to |-1| and |p^n| for |n &gt; 1| to |0|. In other words, it’s the Möbius function
|\mu| we defined before. So |\mathcal D[\mu](s) = 1/\zeta(s)|.</p>
<p>Using |\mathcal D[d](s) = \zeta(s)^2|, we see that
\[\begin{flalign}
\zeta(s)^2
&amp; = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty p^{-ns}\right)^{-2} \\
&amp; = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty (n+1)p^{-ns}\right)^{-1} \\
&amp; = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty d(p^n)p^{-ns}\right)^{-1} \\
&amp; = \mathcal D[d](s)
\end{flalign}\]
Therefore, |d(p^n) = n + 1|. This intuitively makes sense because the only divisors of |p^n| are
|p^k| for |k = 0, \dots, n|, and for |a| and |b| coprime
|d(ab) = \vert D_{ab} \vert = \vert D_a \times D_b\vert = \vert D_a\vert\vert D_b\vert = d(a)d(b)|.</p>
<p>Another result leveraging the theorem is given any multiplicative function |f|, we can define a new
multiplicative function via
|f^{[k]}(p^n) = \begin{cases}f(p^m), &amp; km = n\textrm{ for }m\in\mathbb N \\ 0, &amp; k \nmid n\end{cases}|.</p>
<p><strong>Lemma</strong>: The operation just defined has the property that
|\mathcal D[f^{[k]}](s) = \mathcal D[f](ks)|.<br/>
<strong>Proof</strong>:
\[\begin{flalign}
\mathcal D[f^{[k]}](s)
&amp; = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f^{[k]}(p^n)p^{-ns} \\
&amp; = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f^{[k]}(p^{kn})p^{-nks} \\
&amp; = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-nks} \\
&amp; = \mathcal D[f](ks)
\end{flalign}\]
|\square|</p>
<h2 id="möbius-inversion">Möbius Inversion</h2>
<p>We can write a sum over some function, |f|, of the divisors of a given natural |n| as
|(f \star \bar 1)(n) = \sum_{d \mid n} f(d)|. Call this |g(n)|. But then we have
|\mathcal D[f \star \bar 1] = \mathcal D[f]\mathcal D[\bar 1] = \mathcal D[f]\zeta| and thus
|\mathcal D[f] = \mathcal D[f]\zeta/\zeta = \mathcal D[(f \star \bar 1) \star \mu]|.
Therefore, if we only have the sums |g(n) = \sum_{d \mid n} f(d)| for some unknown |f|, we can
recover |f| via |f(n) = (g \star \mu)(n) = \sum_{d\mid n}g(d)\mu(n/d)|.
This is <a href="https://en.wikipedia.org/wiki/Mobius_inversion">Möbius inversion</a>.</p>
<p>Formally:</p>
<p>\[g(n) = \sum_{d\mid n} f(d) \iff f(n) = \sum_{d \mid n} \mu(d)g(n/d)\]</p>
<p>As a simple example, we clearly have |\zeta(s)/\zeta(s) = 1 = \mathcal D[\varepsilon](s)| so
|\bar 1 \star \mu = \varepsilon| or |\sum_{d \mid n}\mu(d) = 0| for |n &gt; 1| and |1| when |n = 1|.</p>
<p>We also get generalized Möbius inversion via
|\varepsilon(n) = \varepsilon(n)n^k = (\mu\star\bar 1)(n)n^k = (({-})^k\mu\star({-})^k)(n)|. Which
is to say if |g(n) = \sum_{d\mid n}d^k f(n/d)| then |f(n) = \sum_{d\mid n} \mu(d)d^kg(n/d)|.</p>
<p>By considering logarithms, we also get a multiplicative form of (generalized) Möbius inversion:
\[g(n) = \prod_{d\mid n}f(n/d)^{d^k} \iff f(n) = \prod_{d\mid n}g(n/d)^{\mu(d)d^k}\]</p>
<p><strong>Theorem</strong>: As another guise of Möbius inversion, given any completely multiplicative function |h|,
let |g(m) = \sum_{n=1}^\infty f(mh(n))|. Assuming these sums make sense, we can recover |f(k)|
via |f(k) = \sum_{m=1}^\infty \mu(m)g(kh(m))|.</p>
<p><strong>Proof</strong>:
\[\begin{align}
\sum_{m=1}^\infty \mu(m)g(kh(m))
&amp; = \sum_{m=1}^\infty \mu(m)\sum_{n=1}^\infty f(kh(m)h(n)) \\
&amp; = \sum_{N=1}^\infty \sum_{N=mn} \mu(m)f(kh(N)) \\
&amp; = \sum_{N=1}^\infty f(kh(N)) \sum_{N=nm} \mu(m) \\
&amp; = \sum_{N=1}^\infty f(kh(N)) (\mu\star\bar 1)(N) \\
&amp; = \sum_{N=1}^\infty f(kh(N)) \varepsilon(N) \\
&amp; = f(k)
\end{align}\] |\square|</p>
<p>This will often show up in the form of |r(x^{1/n})| or |r(x^{1/n})/n|, i.e. with |h(n)=n^{-1}|
and |f_x(k) = r(x^k)| or |f_x(k) = kr(x^k)|. Typically, we’ll then be computing
|f_x(1) = r(x)|.</p>
<h3 id="lambert-series">Lambert Series</h3>
<p>As a brief aside, it’s worth mentioning <a href="https://en.wikipedia.org/wiki/Lambert_series"><strong>Lambert Series</strong></a>.</p>
<p>Given an arithmetic function |a|, these are series of the form:
\[
\sum_{n=1}^\infty a(n) \frac{x^n}{1-x^n}
= \sum_{n=1}^\infty a(n) \sum_{k=1}^\infty x^{kn}
= \sum_{n=1}^\infty (a \star \bar 1)(n) x^n
\]</p>
<p>This leads to:
\[\sum_{n=1}^\infty \mu(n) \frac{x^n}{1-x^n} = x\]
and:
\[\sum_{n=1}^\infty \varphi(n) \frac{x^n}{1-x^n} = \frac{x}{(1-x)^2}\]</p>
<h3 id="inclusion-exclusion">Inclusion-Exclusion</h3>
<p>The Möbius and |\zeta| functions can be generalized to
<a href="https://en.wikipedia.org/wiki/Incidence_algebra">incidence algebras</a> where this form is from the
incidence algebra induced by the divisibility order<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.
A notable and relevant example of a Möbius
functions for another, closely related, incidence algebra is when we consider the incidence algebra
induced by finite multisets with the inclusion ordering. Let |T| be a finite multiset, we get
|\mu(T) = \begin{cases}0,&amp;T\text{ has repeated elements}\\(-1)^{\vert T\vert},&amp;T\text{ is a set}\end{cases}|.
Since we can view a natural number as a finite multiset of primes, and we can always relabel the
elements of a finite multiset with distinct primes, this is equivalent to the Möbius function we’ve
been using.</p>
<p>This leads to a nice and compact way of describing the <a href="https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle#Other_formulas">principle of inclusion-exclusion</a>.
Let |A| and |S| be (finite) multisets with |S \subseteq A| and assume we have |f| and |g| defined
on the set of sub-multisets of |A|. If \[g(A) = \sum_{S\subseteq A} f(S)\] then
\[f(A) = \sum_{S\subseteq A}\mu(A\setminus S)g(S)\] and this is Möbius inversion for this
notion of Möbius function. We can thus take a different perspective on Möbius inversion. If
|P| is a finite multiset of primes, then
\[g(n_P) = \sum_{Q\subseteq P}f(n_Q) \iff f(n_P) = \sum_{Q\subseteq P}\mu(P\setminus Q)g(n_Q)\]
recalling that |Q\subseteq P \iff n_Q \mid n_P| and |n_{P\setminus Q} = n_P/n_Q| when
|Q\subseteq P|.</p>
<p>We get traditional inclusion-exclusion by noting that |\mu(T)=(-1)^{\vert T\vert}| when |T| is a
set, i.e. all elements have multiplicity at most |1|. Let |I| be a finite set and assume we have a
family of finite sets, |\{T_i\}_{i\in I}|. Write |T = \bigcup_{i\in I}T_i| and define
|\bigcap_{i\in\varnothing}T_i = T|.</p>
<p>Define
\[f(J) = \left\vert\bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i\right\vert\]
for |J\subseteq I|. In particular, |f(I) = 0|.
|f(J)| is then the number of elements shared by all |T_i| for |i\notin J| and no |T_j| for
|j\in J|. Every |x \in \bigcup_{i\in I}T_i| is thus associated to exactly one such subset
of |I|, namely |\{j\in I\mid x\notin T_j\}|. Formally,
|x \in \bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i \iff J = \{j\in I\mid x\notin T_j\}|
so each |\bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i| is disjoint and
\[g(J)
= \sum_{S\subseteq J}f(S)
= \left\vert\bigcup_{S\subseteq J}\left(\bigcap_{i\in I\setminus S}T_i\setminus\bigcup_{i \in S}T_i\right)\right\vert
= \left\vert\bigcap_{i\in I\setminus J}T_i\right\vert
\]
for |J \subseteq I|. In particular, |g(I) = \vert\bigcup_{i\in I}T_i\vert|.</p>
<p>By the Möbius inversion formula for finite sets, we thus have:
\[f(J) = \sum_{S\subseteq J}(-1)^{\vert J\vert - \vert S\vert}g(S)\]
which for |J = I| gives:
\[
0
= \sum_{J\subseteq I}(-1)^{\vert I\vert - \vert J\vert}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert
= \left\vert\bigcup_{i\in I}T_i\right\vert + \sum_{J\subsetneq I}(-1)^{\vert I\vert - \vert J\vert}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert
\]
which is equivalent to the more usual form:
\[\left\vert\bigcup_{i\in I}T_i\right\vert
= \sum_{J\subsetneq I}(-1)^{\vert I\vert - \vert J\vert - 1}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert
= \sum_{\varnothing\neq J\subseteq I}(-1)^{\vert J\vert + 1}\left\vert\bigcap_{i\in J}T_i\right\vert
\]</p>
<h3 id="varphi">|\varphi|</h3>
<p>An obvious thing to explore is to apply Möbius inversion to various arithmetic functions.
A fairly natural first start is applying Möbius inversion to the identity function. From the above
results, we know that this unknown function |\varphi| will satisfy
|\mathcal D[\varphi](s) = \zeta(s-1)/\zeta(s) = \mathcal D[\operatorname{id}\star\mu](s)|.
We also immediately have the property that |n = \sum_{d \mid n}\varphi(d)|. Using Euler’s
product formula we have:
\[\begin{flalign}
\zeta(s-1)/\zeta(s)
&amp; = \prod_{p \in \mathbb P} \frac{1 - p^{-s}}{1 - p^{-s+1}} \\
&amp; = \prod_{p \in \mathbb P} \frac{1 - p^{-s}}{1 - pp^{-s}} \\
&amp; = \prod_{p \in \mathbb P} (1 - p^{-s})\sum_{n=0}^\infty p^n p^{-ns} \\
&amp; = \prod_{p \in \mathbb P} \left(\sum_{n=0}^\infty p^n p^{-ns}\right) - \left(\sum_{n=0}^\infty p^n p^{-s} p^{-ns}\right) \\
&amp; = \prod_{p \in \mathbb P} \left(\sum_{n=0}^\infty p^n p^{-ns}\right) - \left(\sum_{n=0}^\infty p^n p^{-(n + 1)s}\right) \\
&amp; = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty p^n p^{-ns}\right) - \left(\sum_{n=1}^\infty p^{n-1} p^{-ns}\right) \\
&amp; = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty (p^n - p^{n-1}) p^{-ns}\right) \\
&amp; = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty \varphi(p^n) p^{-ns}\right) \\
&amp; = \mathcal D[\varphi](s)
\end{flalign}\]</p>
<p>So |\varphi| is the multiplicative function defined by |\varphi(p^n) = p^n - p^{n-1}|.
For |p^n|, we can see that this counts the number of positive integers less than or equal to |p^n|
which are coprime to |p^n|. There are |p^n| positive integers less than or equal to |p^n|, and
every |p|th one is a multiple of |p| so |p^n/p = p^{n-1}| are <em>not</em> coprime to |p^n|. All the
remainder are coprime to |p^n| since they don’t have |p| in their prime factorizations and |p^n|
only has |p| in its. We need to verify that this interpretation is multiplicative. To be clear, we
know that |\varphi| is multiplicative and that this interpretation works for |p^n|. The question
is whether |\varphi(n)| for general |n| meets the above description, i.e. whether the number of
coprime numbers less than |n| is multiplicative.</p>
<p><strong>Theorem</strong>: The number of coprime numbers less than |n| is multiplicative and is equal to |\varphi(n)|.</p>
<p><strong>Proof</strong>: |\varphi = \mu\star\operatorname{id}|. We have:</p>
<p>\[\begin{flalign}
\varphi(n_P)
&amp; = \sum_{d\mid n_P}\mu(d)\frac{n_P}{d} \\
&amp; = \sum_{Q\subseteq P}\mu(Q)\frac{n_P}{n_Q} \\
&amp; = \sum_{Q\subseteq \mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{n_Q}
\end{flalign}\]</p>
<p>We can see an <a href="https://en.wikipedia.org/wiki/Inclusion-exclusion_principle">inclusion-exclusion</a>
pattern. Specifically, let |C_k = \{ c \in [k] \mid \gcd(c, k) = 1\}| be the numbers less than
or equal to |k| and coprime to |k|. Let |S_{k,m} = \{ c \in [k] \mid m \mid c\}|. We have
|S_{k,a} \cap S_{k,b} = S_{k,\operatorname{lcm}(a,b)}|. Also, when |c \mid k|, then
|\vert S_{k,c}\vert = k/c|. |C_{n_P} = [n_P] \setminus \bigcup_{p \in \mathrm{dom}(P)} S_{n_P,p}|
because every number <em>not</em> coprime to |n_P| shares some prime factor with it. Applying
inclusion-exclusion to the union yields
\[\begin{align}
\vert C_{n_P}\vert
&amp; = n_P - \sum_{\varnothing\neq Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert+1}\left\vert \bigcap_{p\in Q}S_{n_P,p}\right\vert \\
&amp; = n_P + \sum_{\varnothing\neq Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{\prod_{p\in Q}p} \\
&amp; = \sum_{Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{n_Q}
\end{align}\]
|\square|</p>
<p>Many of you will already have recognized that this is <a href="https://en.wikipedia.org/wiki/Euler%27s_totient_function">Euler’s totient function</a>.</p>
<h2 id="combinatorial-species">Combinatorial Species</h2>
<p>The book <a href="https://www.cambridge.org/core/books/combinatorial-species-and-treelike-structures/D994A1F2877BDE63FF0C9EDE2F9788A8"><em>Combinatorial Species and Tree-Like Structures</em></a>
has many examples where Dirichlet convolutions and Möbius inversion come up<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>.
A combinatorial species is a functor |\operatorname{Core}(\mathbf{FinSet})\to\mathbf{FinSet}|.
Any permutation on a finite set can be decomposed into a collection of cyclic permutations.
Let |U| be a finite set of cardinality |n| and |\pi : U \cong U| a permutation of |U|.
For any |u\in U|, there is a smallest |k\in\mathbb N_+| such that |\pi^k(u) = u| where
|\pi^{k+1} = \pi \circ \pi^k| and |\pi^0 = \operatorname{id}|. The |k| elements
|\mathcal O(u)=\{\pi^{i-1}(u)\mid i\in[k]\}| make up a cycle of length |k|, and |\pi|
restricted to |U\setminus O(u)| is a permutation on this smaller set. We can just inductively pull
out another cycle until we run out of elements. Write |\pi_k| for the number of cycles of length
|k| in the permutation |\pi|. We clearly have |n = \sum_{k=1}^\infty k\pi_k| as every cycle
has |k| elements in it.</p>
<p>Write |\operatorname{fix}\pi| for the number of fixed points of |\pi|, i.e. the cardinality of
the set |\{u\in U\mid \pi(u) = u\}|. Clearly, every element that is fixed by |\pi^k| needs
to be in a cycle whose length divides |k|. This leads to the equation:</p>
<p>\[ \operatorname{fix}\pi^k = \sum_{d\mid k} d\pi_d = ((d \mapsto d\pi_d) \star \bar 1)(k)\]</p>
<p>Since |F(\pi^k) = F(\pi)^k| for a combinatorial species |F|, Möbius inversion, as explicitly
stated in Proposition 2.2.3 of <em>Combinatorial Species and Tree-Like Structures</em>, leads to:</p>
<p>\[k(F(\pi))_k
= \sum_{d\mid k}\mu\left(\frac{k}{d}\right)\operatorname{fix}F(\pi^d)
= (\mu\star(d\mapsto \operatorname{fix}F(\pi^d)))(k) \]</p>
<p>If we Dirichlet convolve both sides of this with |\operatorname{id}|, replacing |F(\pi)| with
|\beta| as it doesn’t matter that this permutation comes from an action of a species, we get:</p>
<p>\[\sum_{d\mid m} d\beta_d(m/d)
= m\sum_{d\mid m} \beta_d
= (\varphi\star(d\mapsto \operatorname{fix}\beta^d))(m)\]</p>
<p>This is just using |\varphi = \operatorname{id}\star\mu|. If we choose |m| such that
|\beta^m = \operatorname{id}|, then we get |\sum_{d\mid m} \beta_d = \sum_{k=1}^\infty \beta_k|
because |\beta_k| will be |0| for all the |k| which don’t divide |m|.
This makes the previous equation into equation 2.2 (34) in the book.</p>
<p>Since we know |n = \sum_{k=1}^\infty k\pi_k| for any permutation |\pi|, we also get:
\[\vert F([n])\vert
= \sum_{k=1}^\infty\sum_{d\mid k}\mu\left(\frac{k}{d}\right)\operatorname{fix}F(\pi^d)
= \sum_{k=1}^\infty(\mu\star(d\mapsto\operatorname{fix}F(\pi^d)))(k)\]</p>
<p>These equations give us a way to compute some of these divisor sums by looking at the number
fixed points and cycles of the action of species and vice versa. For example, 2.3 (49) is a
series of Dirichlet convolutions connected to weighted species.</p>
<p>Example 12 from this book presents a nice and perhaps surprising identity. The core of it can be
written as: \[\sum_{k=1}^\infty\ln(1-ax^k) = \sum_{k=1}^\infty\rho_k(a)\ln(1-x^k)\]
where |\rho_k(a) = k^{-1}\sum_{d\mid k}\varphi(k/d)a^d|. We can rewrite this definition as
the characterization |k\rho_k(a) = (\varphi\star a^{({-})})(k)|. Recalling that
|\varphi = \mu \star \operatorname{id}| and |\ln(1-x) = -\sum_{n=1}^\infty x^n/n|, we get
the following derivation:</p>
<p><strong>Theorem</strong>: \[\sum_{k=1}^\infty\ln(1-ax^k) = \sum_{k=1}^\infty\rho_k(a)\ln(1-x^k)\]
where |\rho_k(a) = k^{-1}\sum_{d\mid k}\varphi(k/d)a^d|.</p>
<p><strong>Proof</strong>:
\[\begin{flalign}
\sum_{k=1}^\infty\ln(1-ax^k)
&amp; = -\sum_{k=1}^\infty\sum_{n=1}^\infty \frac{a^n x^{nk}}{n} \\
&amp; = -\sum_{n=1}^\infty\sum_{k=1}^\infty \frac{a^n x^{nk}}{n} \\
&amp; = -\sum_{N=1}^\infty\sum_{k\mid N} \frac{a^{N/k} x^N}{N/k} \tag{N=nk} \\
&amp; = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N} ka^{N/k} \\
&amp; = -\sum_{N=1}^\infty\frac{x^N}{N}(\operatorname{id}\star a^{({-})})(N) \\
&amp; = -\sum_{N=1}^\infty\frac{x^N}{N}(\varepsilon\star\operatorname{id}\star a^{({-})})(N) \tag{the trick} \\
&amp; = -\sum_{N=1}^\infty\frac{x^N}{N}(\bar 1\star\mu\star\operatorname{id}\star a^{({-})})(N) \\
&amp; = -\sum_{N=1}^\infty\frac{x^N}{N}(\bar 1\star\varphi\star a^{({-})})(N) \\
&amp; = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N}(\varphi\star a^{({-})})(k) \\
&amp; = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N}k\rho_k(a) \\
&amp; = -\sum_{n=1}^\infty\frac{x^{nk}}{n}\sum_{k=1}^\infty\rho_k(a) \tag{N=nk again} \\
&amp; = \sum_{k=1}^\infty\rho_k(a) \ln(1-x^k)
\end{flalign}\] |\square|</p>
<h2 id="derivative-of-dirichlet-series">Derivative of Dirichlet series</h2>
<p>We can easily compute the derivative of a Dirichlet series (assuming sufficiently strong convergence
so we can push the differentiation into the sum):</p>
<p>\[\begin{flalign}
\mathcal D[f]’(s)
&amp; = \frac{d}{ds}\sum_{n=1}^\infty f(n)n^{-s} \\
&amp; = \sum_{n=1}^\infty f(n)\frac{d}{ds}n^{-s} \\
&amp; = \sum_{n=1}^\infty f(n)\frac{d}{ds}e^{-s\ln n} \\
&amp; = \sum_{n=1}^\infty -f(n)\ln n e^{-s\ln n} \\
&amp; = -\sum_{n=1}^\infty f(n)\ln n n^{-s} \\
&amp; = -\mathcal D[f\ln](s)
\end{flalign}\]</p>
<p>This leads to the identity |\frac{d}{ds}\ln\mathcal D[f](s) = \mathcal D[f]’ (s)/\mathcal D[f](s) = -\mathcal D[f\ln \star \mu](s)|.
For example, we have |-\zeta’(s)/\zeta(s) = \mathcal D[\ln \star \mu](s)|. Using the Euler
product formula, we have |\ln\zeta(s) = -\sum_{p\in\mathbb P}\ln(1-p^{-s})|. Differentiating
this gives
\[\begin{flalign}
\frac{d}{ds}\ln\zeta(s)
&amp; = -\sum_{p\in\mathbb P} p^{-s}\ln p/(1 - p^{-s}) \\
&amp; = -\sum_{p\in\mathbb P} \sum_{k=1}^\infty \ln p (p^k)^{-s} \\
&amp; = -\sum_{n=1}^\infty \Lambda(n) n^{-s} \\
&amp; = -\mathcal D[\Lambda](s)
\end{flalign}\]
where |\Lambda(n) = \begin{cases}\ln p,&amp;p\in\mathbb P\land\exists k\in\mathbb N_+.n=p^k \\ 0, &amp; \text{otherwise}\end{cases}|.
|\Lambda|, which is <em>not</em> a multiplicative nor an additive function, is known as the <a href="https://en.wikipedia.org/wiki/Von_Mangoldt_function">von Mangoldt function</a>.
Just to write it explicitly, the above implies |\Lambda = \ln \star \mu|, i.e. |\Lambda| is the
Möbius inversion of |\ln|. This can be generalized for arbitrary completely multiplicative
functions besides |\bar 1| to get |\mathcal D[f]’/\mathcal D[f] = \mathcal D[f\Lambda]|.</p>
<p>We now have multiple perspectives on |\Lambda| which is a kind of “indicator function” for prime
powers.</p>
<h2 id="dirichlet-inverse">Dirichlet Inverse</h2>
<p>Let’s say we’re given an arithmetic function |f|, and we want to find an arithmetic function |g|
such that |f \star g = \varepsilon| which we’ll call the <strong>Dirichlet inverse</strong> of |f|.
We immediately get |(f \star g)(1) = f(1)g(1) = 1 = \varepsilon(1)|.
So, supposing |f(1)\neq 1|, we can define |g(1) = 1/f(1)|. We then get a recurrence relation for
all the remaining values of |g| via:
\[0 = (f \star g)(n) = f(1)g(n) + \sum_{d \mid n, d\neq 1} f(d)g(n/d)\]
for |n &gt; 1|. Solving for |g(n)|, we have:
\[g(n) = -f(1)^{-1}\sum_{d\mid n,d\neq 1}f(d)g(n/d)\]
where the right-hand side only requires |g(k)| for |k &lt; n|. If |f| is multiplicative, then
|f(1) = 1| and the inverse of |f| exists.</p>
<p>If |f| is completely multiplicative, its Dirichlet inverse is |\mu f|. This follows easily from
|f \star \mu f = (\bar 1 \star \mu)f = \varepsilon f = \varepsilon|. As an example, |({-})^z| is
completely multiplicative so its inverse is |({-})^z\mu|. Since the inverse of a Dirichlet
convolution is the convolution of the inverses, we get |\varphi^{-1}(n) = \sum_{d\mid n}d\mu(d)|.
Not to be confused with |\varphi(n) = (\operatorname{id}\star\mu)(n) = \sum_{d\mid n} d\mu(n/d)|.</p>
<p>Less trivially, the inverse of a multiplicative function is <em>also</em> a multiplicative function.
We can prove it by complete induction on |\mathbb N_+| using the formula for |g| from above.</p>
<p><strong>Theorem</strong>: If |f\star g = \varepsilon|, then |g| is multiplicative when |f| is.</p>
<p><strong>Proof</strong>: Let |n = ab| where |a| and |b| are coprime. If |a| (or, symmetrically, |b|) is equal to
|1|, then since |g(1) = 1/f(1) = 1|, we have |g(1n) = g(1)g(n) = g(n)|. Now assume neither |a| nor
|b| are |1| and, as the induction hypothesis, assume that |g| is multiplicative on all numbers less
than |n|. We have:
\[\begin{flalign}
g(ab)
&amp; = -\sum_{d\mid ab,d\neq 1}f(d)g(ab/d) \\
&amp; = -\sum_{d_a \mid a}\sum_{d_b \mid b,d_a d_b \neq 1}f(d_ad_b)g(ab/(d_ad_b)) \\
&amp; = -\sum_{d_a \mid a}\sum_{d_b \mid b,d_a d_b \neq 1}f(d_a)f(d_b)g(a/d_a)g(b/d_b)) \\
&amp; = -\sum_{d_b \mid b,d_b \neq 1}f(d_b)g(a)g(b/d_b))
- \sum_{d_a \mid a,d_a \neq 1}\sum_{d_b \mid b}f(d_a)f(d_b)g(a/d_a)g(b/d_b)) \\
&amp; = -g(a)\sum_{d \mid b,d \neq 1}f(d)g(b/d))
- \sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a)\sum_{d_b \mid b}f(d_b)g(b/d_b)) \\
&amp; = g(a)g(b) - \sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a) (f \star g)(b) \\
&amp; = g(a)g(b) - \varepsilon(b)\sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a) \\
&amp; = g(a)g(b)
\end{flalign}\] |\square|</p>
<p>Assuming |f| has a Dirichlet inverse, we also have:
\[\mathcal D[f^{-1}](s) = \mathcal D[f](s)^{-1}\]
immediately from the convolution theorem.</p>
<h2 id="more-examples">More Examples</h2>
<p>Given a multiplicative function |f|:</p>
<p>\[\begin{align}
\mathcal D[f(\gcd({-},n_P))](s)
&amp; = \zeta(s)\prod_{(p,k)\in P}(1 - p^{-s})\left(\sum_{n=0}^\infty f(p^{\min(k,n)})p^{-ns}\right) \\
&amp; = \zeta(s)\prod_{(p,k)\in P}(1 - p^{-s})\left(\frac{f(p^k)p^{-(k+1)s}}{1 - p^{-s}} + \sum_{n=0}^k f(p^n)p^{-ns}\right)
\end{align}\]</p>
<p>As an example, |\eta(s) = (1 - 2^{1-s})\zeta(s) = \mathcal D[f](s)| where
|f(n) = \begin{cases}-1,&amp;n=2\\1,&amp;n\neq 2\end{cases}|.</p>
<p>Alternatively, |f(n) = \mu(\gcd(n, 2))| and we can apply the above formula to see:
\[\begin{flalign}
\mathcal D[\mu(\gcd({-},2))]
&amp; = \zeta(s)(1-2^{-s})\left(\frac{\mu(2)2^{-2s}}{1 - 2^{-s}} + \sum_{n=0}^1 \mu(2^n)2^{-ns}\right) \\
&amp; = \zeta(s)(1-2^{-s})\left(\frac{-2^{-2s}}{1 - 2^{-s}} + 1 - 2^{-s}\right) \\
&amp; = \zeta(s)(-2^{-2s} + (1 - 2^{-s})^2) \\
&amp; = \zeta(s)(1 - 2^{1-s})
\end{flalign}\]</p>
<h3 id="lambda-and-gamma">|\lambda| and |\gamma|</h3>
<p>Recalling, |\lambda| is completely multiplicative and is characterized by |\lambda(p) = -1|.</p>
<p>We can show that |\mathcal D[\lambda](s) = \zeta(2s)/\zeta(s)| which is equivalent to saying
|\bar 1^{(2)} \star \mu = \lambda| or |\lambda\star\bar 1 = \bar 1^{(2)}|.</p>
<p>\[\begin{flalign}
\zeta(2s)/\zeta(s)
&amp; = \prod_{p\in\mathbb P} \frac{1-p^{-s}}{1-(p^{-s})^2} \\
&amp; = \prod_{p\in\mathbb P} \frac{1-p^{-s}}{(1-p^{-s})(1+p^{-s})} \\
&amp; = \prod_{p\in\mathbb P} (1 + p^{-s})^{-1} \\
&amp; = \prod_{p\in\mathbb P} (1 - \lambda(p)p^{-s})^{-1} \\
&amp; = \mathcal D[\lambda](s)
\end{flalign}\]</p>
<p>We have |\lambda\mu = \vert\mu\vert = \mu\mu| is the inverse of |\lambda| so
|\mathcal D[\vert\mu\vert](s) = \zeta(s)/\zeta(2s)|.</p>
<p>Recalling, |\gamma| is multiplicative and is characterized by |\gamma(p^n) = -1|.</p>
<p>\[\begin{flalign}
\mathcal D[\gamma](s)
&amp; = \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty \gamma(p^n)p^{-ns}\right) \\
&amp; = \prod_{p \in \mathbb P}\left(1 - \sum_{n=1}^\infty p^{-ns}\right) \\
&amp; = \prod_{p \in \mathbb P}\left(1 - \left(\sum_{n=0}^\infty p^{-ns} - 1\right)\right) \\
&amp; = \prod_{p \in \mathbb P}\frac{2(1 - p^{-s}) - 1}{1 - p^{-s}} \\
&amp; = \prod_{p \in \mathbb P}\frac{1 - 2p^{-s}}{1 - p^{-s}}
\end{flalign}\]</p>
<p>This implies that |(\gamma\star\mu)(p^n) = \begin{cases}-2, &amp; n=1 \\ 0, &amp; n &gt; 1 \end{cases}|.</p>
<h3 id="indicator-functions">Indicator Functions</h3>
<p>Let |1_{\mathbb P}| be the indicator function for the primes.
We have |\omega = 1_{\mathbb P}\star\bar 1| or |1_{\mathbb P} = \omega\star\mu|. Directly,
|\mathcal D[1_{\mathbb P}](s) = \sum_{p\in\mathbb P}p^{-s}| so we have
|\mathcal D[\omega](s)/\zeta(s) = \sum_{p\in\mathbb P} p^{-s}|.</p>
<p><strong>Lemma</strong>: |\mathcal D[1_{\mathbb P}](s)=\sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns)|<br/>
<strong>Proof</strong>: We proceed as follows:
\[\begin{align}
\sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns)
&amp; = \sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\left(\prod_{p\in\mathbb P}(1 - p^{-ns})^{-1}\right) \\
&amp; = -\sum_{n=1}^\infty \frac{\mu(n)}{n}\sum_{p\in\mathbb P}\ln(1 - p^{-ns}) \\
&amp; = \sum_{p\in\mathbb P}\sum_{n=1}^\infty \frac{\mu(n)}{n}\sum_{k=1}^\infty p^{-kns}/k \\
&amp; = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \sum_{N=kn} \frac{\mu(n)}{N}p^{-Ns} \\
&amp; = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\sum_{N=kn}\mu(n) \\
&amp; = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}(\mu\star\bar 1)(N) \\
&amp; = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\varepsilon(N) \\
&amp; = \sum_{p\in\mathbb P} p^{-s} \\
&amp; = \mathcal D[1_{\mathbb P}](s)
\end{align}\] |\square|</p>
<p>Let |1_{\mathcal P}| be the indicator function for prime powers.
|\Omega = 1_{\mathcal P}\star\bar 1| or |1_{\mathcal P} = \Omega\star\mu|.
|\mathcal D[1_{\mathcal P}](s) = \sum_{p\in\mathbb P}(1 - p^{-s})^{-1}| so we have
|\mathcal D[\Omega](s)/\zeta(s) = \sum_{p\in\mathbb P}(1 - p^{-s})^{-1}|.</p>
<p><strong>Lemma</strong>: |\mathcal D[1_{\mathcal P}](s)=\sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns)|<br/>
<strong>Proof</strong>: This is quite similar to the previous proof.
\[\begin{align}
\sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns)
&amp; = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\sum_{N=kn}\varphi(n) \\
&amp; = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}(\varphi\star\bar 1)(N) \\
&amp; = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N} N \\
&amp; = \sum_{p\in\mathbb P}\sum_{N=1}^\infty p^{-Ns} \\
&amp; = \mathcal D[1_{\mathcal P}](s)
\end{align}\] |\square|</p>
<h2 id="summatory-functions">Summatory Functions</h2>
<p>One thing we’ve occasionally been taking for granted is that the operator |\mathcal D| is
injective. That is, |\mathcal D[f] = \mathcal D[g]| if and only if |f = g|. To show this, we’ll
use the fact that we can (usually) <a href="https://en.wikipedia.org/wiki/Dirichlet_series_inversion">invert</a>
the Mellin transform which can be viewed roughly as a version of |\mathcal D| that operates on
continuous functions.</p>
<p>Before talking about the Mellin transform, we’ll talk about summatory functions as this will ease
our later discussion.</p>
<p>We will turn a sum into a continuous function via a zero-order hold, i.e. we will take the floor
of the input. Thus |\sum_{n\leq x} f(n)| is constant on any interval of the form |[k,k+1)|. It
then (potentially) has jump discontinuities at integer values. The beginning of the sum is at |n=1|
so for all |x&lt;1|, the sum up to |x| is |0|. We will need a slight tweak to better deal with these
discontinuities. This will be indicated by a prime on the summation sign.</p>
<p>For non-integer values of |x|, we have:
\[\sum_{n \leq x}’ f(n) = \sum_{n \leq x} f(n)\]</p>
<p>For |m| an integer, we have:
\[
\sum_{n \leq m}’ f(n)
= \frac{1}{2}\left(\sum_{n&lt;m} f(n) + \sum_{n \leq m} f(n)\right)
= \sum_{n\leq m} f(n) - f(m)/2
\]</p>
<p>This kind of thing should be familiar to those who’ve worked with things like Laplace transforms of
<a href="https://en.wikipedia.org/wiki/Heaviside_step_function">discontinuous functions</a>.
(Not for no reason…)</p>
<p>One reason for introducing these summation functions is they are a little easier to
work with. Arguably, we want something like |\frac{d}{dx}\sum_{n\leq x}f(n) = \sum_{n=1}^\infty f(n)\delta(n-x)|,
but that means we end up with a bunch of distribution nonsense and even more improper integrals.
The summation function may be discontinuous, but it at least has a finite value everywhere.
Of course, another reason for introducing these functions is that they often are values we’re
interested in.</p>
<p>Several important functions are these continuous “sums” of arithmetic functions of this form:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Mertens_function">Mertens function</a>:
|M(x) = \sum_{n\leq x}’ \mu(n)|</li>
<li><a href="https://en.wikipedia.org/wiki/Chebyshev_function">Chebyshev function</a>:
|\vartheta(x) = \sum_{p\leq x, p\in\mathbb P}’ \ln p = \sum_{n\leq x} 1_{\mathbb P}(n)\ln n|</li>
<li>Second Chebyshev function:
|\psi(x) = \sum_{n\leq x}’ \Lambda(n) = \sum_{n=1}^\infty \vartheta(x^{1/n})|</li>
<li>The <a href="https://en.wikipedia.org/wiki/Prime_counting_function">prime-counting function</a>:
|\pi(x) = \sum_{n\leq x}’ 1_{\mathbb P}|</li>
<li><a href="https://en.wikipedia.org/wiki/Prime_counting_function#Riemann&#39;s_prime-power_counting_function">Riemann’s prime-power counting function</a>:
|\Pi_0(x) = \sum_{n\leq x} \frac{\Lambda(n)}{\ln n}
= \sum_{n=1}^\infty \sum_{p^n\leq x,p\in\mathbb P}’ n^{-1}
= \sum_{n=1}^\infty\pi(x^{1/n})n^{-1}|</li>
<li>|D(x) = \sum_{n\leq x}d(n)|</li>
</ul>
<p>These are interesting in how they <a href="https://en.wikipedia.org/wiki/Prime-counting_function#Other_prime-counting_functions">related to the prime-counting function</a>.</p>
<p>Let’s consider the arithmetic function |\Lambda/\ln| whose Dirichlet series is |\ln\zeta|.</p>
<p>We have the summation function |\sum_{n\leq x}’ \Lambda(n)/\ln(n)|, but |\Lambda(n)| is |0|
except when |n=p^k| for some |p\in\mathbb P| and |k\in\mathbb N_+|. Therefore, we have
\[\begin{align}
\sum_{n\leq x}’ \frac{\Lambda(n)}{\ln(n)}
&amp; = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{\Lambda(p^k)}{\ln(p^k)} \\
&amp; = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{\ln(p)}{k\ln(p)} \\
&amp; = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{1}{k} \\
&amp; = \sum_{k=1}^\infty \frac{1}{k} \sum_{p^k\leq x, p\in\mathbb P}’ 1 \\
&amp; = \sum_{k=1}^\infty \frac{1}{k} \sum_{p\leq x^{1/k}, p\in\mathbb P}’ 1 \\
&amp; = \sum_{k=1}^\infty \frac{\pi(x^{1/k})}{k} \\
\end{align}\]</p>
<p>|\ln\zeta(s) = s\mathcal M[\Pi_0](-s)=\mathcal D[\Lambda/\ln](s)|
where |\mathcal M| is the Mellin transform, and the connection to Dirichlet series is described
in the following section.</p>
<h3 id="mellin-transform">Mellin Transform</h3>
<p>The definition of the <a href="https://en.wikipedia.org/wiki/Mellin_transform">Mellin transform</a> and its
inverse are:</p>
<p>\[\mathcal M[f](s) = \int_0^\infty x^s\frac{f(x)}{x}dx\]
\[\mathcal M^{-1}[\varphi](x) = \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty} x^{-s}\varphi(s)ds\]</p>
<p>The contour integral is intended to mean the vertical line with real part |c| traversed from
negative to positive imaginary values. Modulo the opposite sign of |s| and the extra factor of |x|,
this is quite similar to a continuous version of a Dirichlet series.</p>
<p>The Mellin transform is closely related to the <a href="https://en.wikipedia.org/wiki/Two-sided_Laplace_transform">two-sided Laplace transform</a>.</p>
<p>\[\mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s)\]</p>
<p>Using <a href="https://en.wikipedia.org/wiki/Mellin_transform#Properties">Mellin transform properties</a>,
particularly the one for transforming the derivative, we can write the following.</p>
<p>\[\begin{align}
\mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s)
&amp; \iff \mathcal D[f](1-s) = -(s-1)\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](s-1) \\
&amp; \iff \mathcal D[f](1-s) = \mathcal M\left[x\mapsto \frac{d}{dx}\sum_{n\leq x}’ f(n)\right](s) \\
&amp; \iff \mathcal D[f](1-s) = \int_0^\infty x^{s-1}\frac{d}{dx}\sum_{n\leq x}’ f(n)dx \\
&amp; \iff \mathcal D[f](1-s) = \int_0^\infty x^{s-1}\sum_{n=1}^\infty f(n)\delta(x-n)dx \\
&amp; \iff \mathcal D[f](1-s) = \sum_{n=1}^\infty f(n)n^{s-1} \\
&amp; \iff \mathcal D[f](s) = \sum_{n=1}^\infty f(n)n^{-s}
\end{align}\]</p>
<p>This leads to <a href="https://en.wikipedia.org/wiki/Perron%27s_formula">Perron’s formula</a></p>
<p>\[\begin{align}
\sum_{n\leq x}’ f(n)
&amp; = \mathcal M^{-1}[s\mapsto -\mathcal D[f](-s)/s](x) \\
&amp; = \frac{1}{2\pi i}\int_{-c-i\infty}^{-c+i\infty}\frac{\mathcal D[f](-s)}{-s} x^{-s} ds \\
&amp; = -\frac{1}{2\pi i}\int_{c+i\infty}^{c-i\infty}\frac{\mathcal D[f](s)}{s} x^s ds \\
&amp; = \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty}\frac{\mathcal D[f](s)}{s} x^s ds
\end{align}\]</p>
<p>for which we need to take the <a href="https://en.wikipedia.org/wiki/Cauchy_principal_value">Cauchy principal value</a>
to get something defined. (See also <a href="https://en.wikipedia.org/wiki/Abel%27s_summation_formula">Abel summation</a>.)</p>
<p>There are side conditions on the convergence of |\mathcal D[f]| for these formulas to be
justified. See the links.</p>
<p>Many of the operations we’ve described on Dirichlet series follow from Mellin transform properties.
For example, we have |\mathcal M[f]’(s) = \mathcal M[f\ln](s)| generally.</p>
<h2 id="summary">Summary</h2>
<h3 id="properties">Properties</h3>
<h4 id="dirichlet-convolution-1">Dirichlet Convolution</h4>
<p>Dirichlet convolution is |(f\star g)(n) = \sum_{d\mid n} f(d)g(n/d) = \sum_{mk=n} f(m)g(k)|.</p>
<p>Dirichlet convolution forms a commutative ring with it as the multiplication, |\varepsilon| as the
multiplicative unit and the usual additive structure. This is to say that Dirichlet convolution
is commutative, associative, unital, and bilinear.</p>
<p>For |f| completely multiplicative, |f(g\star h) = fg \star fh|.</p>
<h4 id="dirichlet-inverse-1">Dirichlet Inverse</h4>
<p>For any |f| such that |f(1)\neq 0|, there is a |g| such that |f\star g = \varepsilon|. In particular,
the set of multiplicative functions forms a subgroup of this multiplicative group, i.e. the
Dirichlet convolution of multiplicative functions is multiplicative.</p>
<p>If |f(1) \neq 0|, then |f \star g = \varepsilon| where |g| is defined by the following recurrence:</p>
<p>\[\begin{flalign}
g(1) &amp; = 1/f(1) \\
g(n) &amp; = -f(1)^{-1}\sum_{d\mid n,d\neq 1}f(d)g(n/d)
\end{flalign}\]</p>
<p>For a completely multiplicative |f|, its Dirichlet inverse is |\mu f|.</p>
<h4 id="convolution-theorem">Convolution Theorem</h4>
<p>\[\mathcal D[f](s)\mathcal D[g](s) = \mathcal D[f\star g](s)\]</p>
<h4 id="möbius-inversion-1">Möbius Inversion</h4>
<p>\[\varepsilon = \bar 1 \star \mu\]</p>
<p>This means from a divisor sum |g(n)\sum_{d\mid n}f(d) = (f\star\bar 1)(n)| for each |n|, we can
recover |f| via |g\star\mu = f\star\bar 1\star\mu = f|. Which is to say
|f(n)=\sum_{d\mid n}g(d)\mu(n/d)|.</p>
<p>This can be generalized via |({-})^k\mu\star({-})^k = \varepsilon|. In sums, this means when
|g(n)=\sum_{d\mid n}d^k f(n/d)|, then |f(n)=\sum_{d\mid n}\mu(d)d^k g(n/d)|.</p>
<p>Let |h| be a completely multiplicative function.
Given |g(m) = \sum_{n=1}^\infty f(mh(n))|, then |f(n) = \sum_{m=1}^\infty \mu(m)g(nh(m))|.</p>
<p>Using the Möbius function for finite multisets and their inclusion ordering, we can recast
Möbius inversion of naturals as Möbius inversion of finite multisets (of primes) a la:
\[n_P
= \sum_{Q\subseteq P}\mu(P\setminus Q)n_Q
= \sum_{Q\subseteq P}\mu(n_P/n_Q)n_Q
= \sum_{d\mid n_P}\mu(n_P/d)d
\]</p>
<p>As a nice result, we have:
\[\sum_{n=1}^\infty\ln(1-ax^n) = \sum_{n=1}^\infty\rho_n(a)\ln(1-x^n)\]
where |n\rho_n(a) = (\varphi \star a^{({-})})(n)|.</p>
<h4 id="dirichlet-series-1">Dirichlet Series</h4>
<p>\[\mathcal D[f](s) = \sum_{n=1}^\infty f(n)n^{-s}\]</p>
<p>\[\mathcal D[n\mapsto f(n)n^k](s) = \mathcal D[f](s - k)\]</p>
<p>\[\mathcal D[f^{-1}](s) = \mathcal D[f](s)^{-1}\]
where the inverse on the left is the Dirichlet inverse.</p>
<p>\[\mathcal D[f]’(s) = -\mathcal D[f\ln](s)\]</p>
<p>For a completely multiplicative |f|,
\[\mathcal D[f]’(s)/\mathcal D[f](s) = -\mathcal D[f\Lambda](s)\]
and:
\[\ln\mathcal D[f](s) = \mathcal D[f\Lambda/\ln](s)\]</p>
<p>Dirichlet series as a Mellin transform:</p>
<p>\[\mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s)\]</p>
<p>The corresponding inverse Mellin transform statement is called <strong>Perron’s Formula</strong>:</p>
<p>\[\sum_{n\leq x}’ f(n)
= \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty}\frac{\mathcal D[f](s)}{s} x^s ds\]</p>
<h4 id="euler-product-formula-1">Euler Product Formula</h4>
<p>Assuming |f| is multiplicative, we have:</p>
<p>\[\mathcal D[f](s)
= \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-ns}
= \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty f(p^n)p^{-ns}\right)
\]</p>
<p>When |f| is completely multiplicative, this can be simplified to:</p>
<p>\[\mathcal D[f](s) = \prod_{p \in \mathbb P}(1 - f(p)p^{-s})^{-1} \]</p>
<h4 id="lambert-series-1">Lambert Series</h4>
<p>Given an arithmetic function |a|, these are series of the form:
\[
\sum_{n=1}^\infty a(n) \frac{x^n}{1-x^n} = \sum_{n=1}^\infty (a \star \bar 1)(n) x^n
\]</p>
<p>\[\sum_{n=1}^\infty \mu(n) \frac{x^n}{1-x^n} = x\]</p>
<p>\[\sum_{n=1}^\infty \varphi(n) \frac{x^n}{1-x^n} = \frac{x}{(1-x)^2}\]</p>
<h3 id="arithmetic-function-definitions">Arithmetic function definitions</h3>
<p>|f(p^n)=\cdots| implies a multiplicative/additive function, while |f(p)=\cdots| implies a
completely multiplicative/additive function.</p>
<p>|p^z| for |z\in\mathbb C| is completely multiplicative. This includes the identity function
(|z=1|) and |\bar 1| (|z=0|). For any multiplicative |f|, |f\circ \gcd({-},k)| is multiplicative.</p>
<p>|\ln| is completely additive.</p>
<p>Important but neither additive nor multiplicative are the indicator functions for primes
|1_{\mathbb P}| and prime powers |1_{\mathcal P}|.</p>
<p>The following functions are (completely) multiplicative unless otherwise specified.</p>
<p>\[\begin{flalign}
\varepsilon(p) &amp; = 0 \tag{unit function} \\
\bar 1(p) &amp; = 1 = p^0 \\
\mu(p^n) &amp; = \begin{cases}-1, &amp; n = 1 \\ 0, &amp; n &gt; 1\end{cases} \tag{Möbius function} \\
\Omega(p) &amp; = 1 \tag{additive} \\
\lambda(p) &amp; = -1 = (-1)^{\Omega(p)} \tag{Liouville function} \\
\omega(p^n) &amp; = 1 \tag{additive} \\
\gamma(p^n) &amp; = -1 = (-1)^{\omega(p^n)} \\
a(p^n) &amp; = p(n) \tag{p(n) is the partition function} \\
\varphi(p^n) &amp; = p^n - p^{n-1} = p^n(1 - 1/p) = J_1(p^n) \tag{Euler totient function} \\
\sigma_k(p^n) &amp; = \sum_{m=0}^n p^{km} = \sum_{d\mid p^n} d^k = \frac{p^{k(n+1)}-1}{p^k - 1} \tag{last only works for k&gt;0} \\
d(p^n) &amp; = n + 1 = \sigma_0 \\
f^{[k]}(p^n) &amp; = \begin{cases}f(p^m),&amp; km=n\\0,&amp; k\nmid n\end{cases} \tag{f multiplicative} \\
\Lambda(n) &amp; = \begin{cases}\ln p,&amp;p\in\mathbb P\land\exists k\in\mathbb N_+.n=p^k \\ 0, &amp; \text{otherwise}\end{cases} \tag{not multiplicative} \\
J_k(p^n) &amp; = p^{kn} - p^{k(n-1)} = p^{kn}(1 - p^{-k}) \tag{Jordan totient function} \\
\psi_k(p^n) &amp; = p^{kn} + p^{k(n-1)} = p^{kn}(1 + p^{-k}) = J_{2k}(p^n)/J_k(p^n) \tag{Dedekind psi function} \\
\end{flalign}\]</p>
<h3 id="dirichlet-convolutions">Dirichlet convolutions</h3>
<p>\[\begin{flalign}
\varepsilon &amp; = \bar 1 \star \mu \\
\varphi &amp; = \operatorname{id}\star\mu \\
\sigma_z &amp; = ({-})^z \star \bar 1 = \psi_z \star \bar 1^{(2)} \\
\sigma_1 &amp; = \varphi \star d \\
d &amp; = \sigma_0 = \bar 1 \star \bar 1 \\
f \star f &amp; = fd \tag{f completely multiplicative} \\
f\Lambda &amp; = f\ln \star f\mu = f\ln \star f^{-1} \tag{f completely multiplicative, Dirichlet inverse} \\
\lambda &amp; = \bar 1^{(2)} \star \mu \\
\vert\mu\vert &amp; = \lambda^{-1} = \mu\lambda \tag{Dirichlet inverse} \\
2^\omega &amp; = \vert\mu\vert \star \bar 1 \\
\psi_z &amp; = ({-})^z \star \vert\mu\vert \\
\operatorname{fix} \pi^{(-)} &amp; = \bar 1 \star (k \mapsto k\pi_k) \tag{for a permutation} \\
({-})^k &amp; = J_k \star \bar 1
\end{flalign}\]</p>
<p>More Dirichlet convolution identities are <a href="https://en.wikipedia.org/wiki/Dirichlet_ring#Properties_and_examples">here</a>,
though many are trivial consequences of the earlier properties.</p>
<h3 id="dirichlet-series-2">Dirichlet series</h3>
<p>\[\begin{array}{l|ll}
f(n) &amp; \mathcal D[f](s) &amp; \\ \hline
\varepsilon(n) &amp; 1 &amp; \\
\bar 1(n) &amp; \zeta(s) &amp; \\
n &amp; \zeta(s-1) &amp; \\
n^z &amp; \zeta(s-z) &amp; \\
\sigma_z(n) &amp; \zeta(s-z)\zeta(s) &amp; \\
\mu(n) &amp; \zeta(s)^{-1} &amp; \\
\vert\mu(n)\vert &amp; \zeta(s)/\zeta(2s) &amp; \\
\varphi(n) &amp; \zeta(s-1)/\zeta(s) &amp; \\
d(n) &amp; \zeta(s)^2 &amp; \\
\mu(\gcd(n, 2)) &amp; \eta(s) = (1-2^{1-s})\zeta(s) &amp; \\
\lambda(n) &amp; \zeta(2s)/\zeta(s) \\
\gamma(n) &amp; \prod_{p \in \mathbb P}\frac{1-2p^{-s}}{1-p^{-s}} &amp; \\
f^{[k]}(n) &amp; \mathcal D[f](ks) &amp; \\
f(n)\ln n &amp; -\mathcal D[f]’ (s) &amp; f\text{ completely multiplicative}\\
\Lambda(n) &amp; -\zeta’(s)/\zeta(s) &amp; \\
\Lambda(n)/\ln(n) &amp; \ln\zeta(s) &amp; \\
1_{\mathbb P}(n) &amp; \sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns) &amp; \\
1_{\mathcal P}(n) &amp; \sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns) &amp; \\
\psi_k(n) &amp; \zeta(s)\zeta(s - k)/\zeta(2s) &amp; \\
J_k(n) &amp; \zeta(s - k)/\zeta(s) &amp;
\end{array}\]</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>Viewing natural numbers as multisets, |D_n| is
the set of all sub-multisets of |n|. The isomorphism described is then simply the fact that given
any sub-multiset of the union of two disjoint multisets, we can sort the elements into their
original multisets producing two sub-multisets of the disjoint multisets.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>Incidence algebras are a decategorification
of the notion of a <a href="https://en.wikipedia.org/wiki/Category_algebra">category algebra</a>.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>In a different but
related vein, we can provide a combinatorial interpretation of Dirichlet series via Dirichlet
species as described in <a href="https://arxiv.org/abs/math/0503436v2">On the arithmetic product of combinatorial species</a>.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>2025-08-22 16:25:34-07:00</pubDate>
    <guid>https://www.hedonisticlearning.com/posts/arithmetic-functions.html</guid>
    <dc:creator>Derek Elkins</dc:creator>
</item>
<item>
    <title>What difference lists actually are</title>
    <link>https://www.hedonisticlearning.com/posts/functional-lists-are-not-difference-lists.html</link>
    <description><![CDATA[<h2 id="introduction">Introduction</h2>
<p>Purely functional list concatenation, <code>xs ++ ys</code> in Haskell syntax, is well known to be linear time
in the length of the first input and constant time in the length of the second, i.e. <code>xs ++ ys</code> is
O(length xs). This leads to quadratic complexity if we have a bunch of left associated uses of
concatenation.</p>
<p>The ancient trick to resolve this is to, instead of producing lists, produce list-to-list functions
a la <code>[a] -&gt; [a]</code> or <code>ShowS</code> = <code>String -&gt; String</code> = <code>[Char] -&gt; [Char]</code>. “Concatenation” of “lists”
represented this way is just function composition which is a constant time operation. We can lift a
list <code>xs</code> to this representation via the section <code>(xs ++)</code>. This will still lead to O(length xs)
amount of work to apply this function, but a composition of such functions applied to a list will
always result in a fully right associated expression even if the function compositions aren’t
right associated.</p>
<p>In the last several years, it has become popular to refer to this technique as “difference lists”.
Often no justification is given for this name. When it is given, it is usually a reference to the
idea of difference lists in logic programming. Unfortunately, other than both techniques giving rise
to efficient concatenation, they have almost no similarities.</p>
<!--more-->
<h2 id="functional-lists">Functional Lists</h2>
<p>To start, I want to do a deeper analysis of the “functional lists” approach, because I think what it
is doing is a bit misunderstood and, consequently, oversold<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. Let’s see how we would model this approach in an OO
language without higher-order functions, such as early Java. I’ll use strings for simplicity, but it
would be exactly the same for generic lists.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">interface</span> PrependTo <span class="op">{</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>  <span class="bu">String</span> <span class="fu">prependTo</span><span class="op">(</span><span class="bu">String</span> end<span class="op">);</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> Compose <span class="kw">implements</span> PrependTo <span class="op">{</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>  <span class="kw">private</span> PrependTo left<span class="op">;</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>  <span class="kw">private</span> PrependTo right<span class="op">;</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>  <span class="kw">public</span> <span class="fu">Compose</span><span class="op">(</span>PrependTo left<span class="op">,</span> PrependTo right<span class="op">)</span> <span class="op">{</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>    <span class="kw">this</span><span class="op">.</span><span class="fu">left</span> <span class="op">=</span> left<span class="op">;</span> <span class="kw">this</span><span class="op">.</span><span class="fu">right</span> <span class="op">=</span> right<span class="op">;</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>  <span class="op">}</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>  <span class="bu">String</span> <span class="fu">prependTo</span><span class="op">(</span><span class="bu">String</span> end<span class="op">)</span> <span class="op">{</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a>    <span class="kw">this</span><span class="op">.</span><span class="fu">left</span><span class="op">.</span><span class="fu">prependTo</span><span class="op">(</span><span class="kw">this</span><span class="op">.</span><span class="fu">right</span><span class="op">.</span><span class="fu">prependTo</span><span class="op">(</span>end<span class="op">));</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>  <span class="op">}</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> Prepend <span class="kw">implements</span> PrependTo <span class="op">{</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a>  <span class="kw">private</span> <span class="bu">String</span> s<span class="op">;</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a>  <span class="kw">public</span> <span class="fu">Prepend</span><span class="op">(</span><span class="bu">String</span> s<span class="op">)</span> <span class="op">{</span> <span class="kw">this</span><span class="op">.</span><span class="fu">s</span> <span class="op">=</span> s<span class="op">;</span> <span class="op">}</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a>  <span class="bu">String</span> <span class="fu">prependTo</span><span class="op">(</span><span class="bu">String</span> end<span class="op">)</span> <span class="op">{</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> <span class="kw">this</span><span class="op">.</span><span class="fu">s</span> <span class="op">+</span> end<span class="op">;</span></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a>  <span class="op">}</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>This is just a straight, manual implementation of closures for <code>(.)</code> and <code>(++)</code> (specialized to
strings). Other lambdas not of the above two forms would lead to other implementations of
<code>PrependTo</code>. Let’s say, however, these are the only two forms that actually occur, which is mostly
true in Haskell practice, then another view on this OO code (to escape back to FP) is that it is an
OOP encoding of the algebraic data type:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">PrependTo</span> <span class="ot">=</span> <span class="dt">Compose</span> <span class="dt">PrependTo</span> <span class="dt">PrependTo</span> <span class="op">|</span> <span class="dt">Prepend</span> <span class="dt">String</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="ot">prependTo ::</span> <span class="dt">PrependTo</span> <span class="ot">-&gt;</span> <span class="dt">String</span> <span class="ot">-&gt;</span> <span class="dt">String</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>prependTo (<span class="dt">Compose</span> left right) end <span class="ot">=</span> prependTo left (prependTo right end)</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>prependTo (<span class="dt">Prepend</span> s) end <span class="ot">=</span> s <span class="op">++</span> end</span></code></pre></div>
<p>We could have also arrived at this by defunctionalizing a typical example of the technique. Modulo
some very minor details (that could be resolved by using the Church-encoded version of this), this
does accurately reflect what’s going on in the technique. <code>Compose</code> is clearly constant time. Less
obviously, applying these functional lists requires traversing this tree of closures – made
into an explicit tree here. In fact, this reveals that this representation could require arbitrarily
large amounts of work for a given size of output. This is due to the fact that prepending an empty
string doesn’t increase the output size but still increases the size of the tree. In practice,
it’s a safe assumption that, on average, at least one character will be prepended per leaf of the
tree which makes the overhead proportional to the size of the output.</p>
<p>This tree representation is arguably better than the “functional list” representation. It’s less
flexible for producers, but that’s arguably a good thing because we didn’t really want <em>arbitrary</em>
<code>String -&gt; String</code> functions. It’s more flexible for consumers. For example, getting the head of
the list is a relatively efficient operation compared to applying a “functional list” and taking
the head of the result even in an eager language. (Laziness makes both approaches comparably
efficient.) Getting the last element is just the same for the tree version, but, even with laziness,
is much worse for the functional version. More to the point, this concrete representation allows
the concatenation function to avoid adding empty nodes to the tree whereas <code>(.)</code> can’t pattern
match on whether a function is the identity function or not.</p>
<p>This view makes it very clear what the functional version is doing.</p>
<h2 id="difference-lists-in-prolog">Difference Lists in Prolog</h2>
<p>List append is the archetypal example of a Prolog program due to the novelty of its “invertibility”.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode prolog"><code class="sourceCode prolog"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>append([]<span class="kw">,</span> <span class="dt">Ys</span><span class="kw">,</span> <span class="dt">Ys</span>)<span class="kw">.</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>append([<span class="dt">X</span><span class="fu">|</span><span class="dt">Xs</span>]<span class="kw">,</span> <span class="dt">Ys</span><span class="kw">,</span> [<span class="dt">X</span><span class="fu">|</span><span class="dt">Zs</span>]) <span class="kw">:-</span> append(<span class="dt">Xs</span><span class="kw">,</span> <span class="dt">Ys</span><span class="kw">,</span> <span class="dt">Zs</span>)<span class="kw">.</span></span></code></pre></div>
<p>For our purposes, viewing this as a function of the first two arguments, this is exactly the usual
functional implementation of list concatenation with exactly the same problems. We could, of course,
encode the defunctionalized version of the functional approach into (pure) Prolog. This would
produce:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode prolog"><code class="sourceCode prolog"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>prepend_to(compose(<span class="dt">Xs</span><span class="kw">,</span> <span class="dt">Ys</span>)<span class="kw">,</span> <span class="dt">End</span><span class="kw">,</span> <span class="dt">Zs</span>) <span class="kw">:-</span> prepend_to(<span class="dt">Ys</span><span class="kw">,</span> <span class="dt">End</span><span class="kw">,</span> <span class="dt">End2</span>)<span class="kw">,</span> prepend_to(<span class="dt">Xs</span><span class="kw">,</span> <span class="dt">End2</span><span class="kw">,</span> <span class="dt">Zs</span>)<span class="kw">.</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>prepend_to(prepend(<span class="dt">Xs</span>)<span class="kw">,</span> <span class="dt">End</span><span class="kw">,</span> <span class="dt">Zs</span>) <span class="kw">:-</span> append(<span class="dt">Xs</span><span class="kw">,</span> <span class="dt">End</span><span class="kw">,</span> <span class="dt">Zs</span>)<span class="kw">.</span></span></code></pre></div>
<p>(I’ll be ignoring the issues that arise due to Prolog’s untyped nature.)</p>
<p>However, this being a logic programming language means we have additional tools available to use
that functional languages lack. Namely, unification variables. For an imperative (destructive)
implementation of list concatenation, the way we’d support efficient append of linked lists is we’d
keep pointers to the start <em>and end</em> of the list. To append two lists, we’d simply use the end
pointer of the first to update the end of the first list to point at the start of the second. We’d
then return a pair consisting of the start pointer of the first and the end pointer of the second.</p>
<p>This is exactly how Prolog difference lists work, except instead of pointers, we use unification
variables which are more principled. Concretely, we represent a list as a pair of lists, but the
second list will be represented by an <em>unbound</em> unification variable and the first list contains
that same unification variable as a suffix. This pair is often represented using the infix
operator (“functor” in Prolog terminology), <code>-</code>, e.g. <code>Xs - Ys</code>. We could use <code>diff(Xs, Ys)</code> or
some other name. <code>-</code> isn’t a built-in operator, it’s just a binary constructor essentially.</p>
<p>At the level of logic, there are no unification variables. The constraints above mean that <code>Xs - Ys</code>
is a list <code>Xs</code> which contains <code>Ys</code> as a suffix.</p>
<p>The name “difference list” is arguably motivated by the definition of concatenation in this
representation.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode prolog"><code class="sourceCode prolog"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>concat(<span class="dt">Xs</span> <span class="fu">-</span> <span class="dt">Ys</span><span class="kw">,</span> <span class="dt">Ys</span> <span class="fu">-</span> <span class="dt">Zs</span><span class="kw">,</span> <span class="dt">Xs</span> <span class="fu">-</span> <span class="dt">Zs</span>)<span class="kw">.</span></span></code></pre></div>
<p>This looks a lot like |Xs - Ys + Ys - Zs = Xs - Zs|. If the suffix component of the first argument
is unbound, like it’s supposed to be, then this is a constant-time operation of binding that
component to <code>Ys</code>. If it is bound, then we need to unify which, in the worst-case, is O(length Ys)
where the length is up to either nil or an unbound variable tail<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<p>We also have the unit of <code>concat</code>, i.e. the empty
list via<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode prolog"><code class="sourceCode prolog"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>empty(<span class="dt">Xs</span> <span class="fu">-</span> <span class="dt">Xs</span>)<span class="kw">.</span></span></code></pre></div>
<p>See the footnote, but this does in some way identify <code>Xs - Ys</code> with the “difference” of <code>Xs</code> and
<code>Ys</code>.</p>
<p>We get back to a “normal” list via:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode prolog"><code class="sourceCode prolog"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>to_list(<span class="dt">Xs</span> <span class="fu">-</span> []<span class="kw">,</span> <span class="dt">Xs</span>)<span class="kw">.</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="co">% or more generally,</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>prepend_to(<span class="dt">Xs</span> <span class="fu">-</span> <span class="dt">Ys</span><span class="kw">,</span> <span class="dt">Ys</span><span class="kw">,</span> <span class="dt">Xs</span>)<span class="kw">.</span></span></code></pre></div>
<p><code>to_list</code> is a constant-time operation, no matter what. Note, <code>to_list</code> binds the suffix component
of the difference list. This means that the first input no longer meets our condition to be a
difference list. In other words, <code>to_list</code> (and <code>prepend_to</code>) <em>consumes</em> the difference list.
More precisely, it constrains the possible suffixes the list could be.
Indeed, any operation that binds the suffix component of a difference list consumes it. For example,
<code>concat</code> consumes its first argument.</p>
<p>Of course, it still makes logical sense to work with the difference list when its suffix component
is bound, it’s just that its operational interpretation is different. More to the point, given a
difference list, you cannot prepend it (via <code>prepend_to</code> or <code>concat</code>) to two different lists to get
two different results.</p>
<p>Converting from a list does require traversing the list since we need to replace the nil node, i.e.
<code>[]</code>, with a fresh unbound variable. Luckily, this is exactly what <code>append</code> does.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode prolog"><code class="sourceCode prolog"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>from_list(<span class="dt">Xs</span><span class="kw">,</span> <span class="dt">Ys</span> <span class="fu">-</span> <span class="dt">Zs</span>) <span class="kw">:-</span> append(<span class="dt">Xs</span><span class="kw">,</span> <span class="dt">Zs</span><span class="kw">,</span> <span class="dt">Ys</span>)<span class="kw">.</span></span></code></pre></div>
<p><code>from_list</code> also suggests this “difference list” idea. If all of <code>Xs</code>, <code>Ys</code>, and <code>Zs</code> are ground
terms, then <code>from_list(Xs, Ys - Zs)</code> holds when <code>append(Xs, Zs, Ys)</code> holds. Exactly when if our
invariants are maintained, i.e. that <code>Zs</code> is a suffix of <code>Ys</code>. Writing these relations more
functionally and writing <code>append</code> as addition, we’d have:</p>
<p>\[\mathtt{from\_list}(Xs) = Ys - Zs \iff Xs + Zs = Ys\]</p>
<p>If we did want to “duplicate” a difference list, we’d essentially need to convert it to a (normal)
list with <code>to_list</code>, and then we could use <code>from_list</code> multiple times on that result. This would,
of course, still consume the original difference list. We’d also be paying O(length Xs) for every
duplicate, including to replace the one we just consumed<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>.</p>
<p>That said, we <em>can</em> prepend a list to a difference list without consuming it. We can perform
other actions with the risk of (partially) consuming the list, e.g. indexing into the list. Indexing
into the list would force the list to be at least a certain length, but still allow prepending to
any list that will result in a final list at least that long.</p>
<h2 id="comparison">Comparison</h2>
<p>I’ll start the comparison with a massive discrepancy that we will ignore going forward. Nothing
enforces that a value of type <code>ShowS</code> actually just appends something to its input. We could use
abstract data type techniques or the defunctionalized version to avoid this. To be fair, difference
lists also need an abstraction barrier to ensure their invariants, though their failure modes are
different. A difference list can’t change what it is based on what it is prepended to.</p>
<style>
tr.even {
  background-color: var(--panel-heading-bg-color);
}
tr.odd {
  background-color: var(--panel-heading-border-color);
}
</style>
<table>
<colgroup>
<col style="width: 53%" />
<col style="width: 46%" />
</colgroup>
<thead>
<tr>
<th>Functional Representation</th>
<th>Difference Lists</th>
</tr>
</thead>
<tbody>
<tr>
<td>constant-time concatenation</td>
<td>constant-time concatenation</td>
</tr>
<tr>
<td>constant-time conversion from a list (though you pay for it later)</td>
<td>O(n) conversion from a list</td>
</tr>
<tr>
<td>persistent</td>
<td>non-persistent, requires linear use</td>
</tr>
<tr>
<td>represented by a tree of closures</td>
<td>represented by a pair of a list and a unification variable</td>
</tr>
<tr>
<td>O(n) (or worse!) conversion to a list</td>
<td>constant-time conversion to a list</td>
</tr>
<tr>
<td>defunctionalized version can be implemented in pretty much any language</td>
<td>requires at least single-assignment variables</td>
</tr>
<tr>
<td>unclear connection to being the difference of two lists (which two lists?)</td>
<td>mathematical, if non-obvious, connection to being the difference of two (given) lists</td>
</tr>
</tbody>
</table>
<p><br/></p>
<p>As an illustration of the difference between persistent and non-persistent uses, the function:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>double f <span class="ot">=</span> f <span class="op">.</span> f</span></code></pre></div>
<p>is a perfectly sensible function on <code>ShowS</code> values that behaves exactly as you’d expect. On the
other hand:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode prolog"><code class="sourceCode prolog"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>double(<span class="dt">In</span><span class="kw">,</span> <span class="dt">Out</span>) <span class="kw">:-</span> concat(<span class="dt">In</span><span class="kw">,</span> <span class="dt">In</span><span class="kw">,</span> <span class="dt">Out</span>)<span class="kw">.</span></span></code></pre></div>
<p>is nonsense that will fail the occurs check (if it is enabled, otherwise it will create a cyclic
list) except for when <code>In</code> is the empty difference list.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I hope I’ve illustrated that the functional representation is not just <em>not</em> difference lists, but
is, in fact, <em>wildly different from</em> difference lists.</p>
<p>This functional representation is enshrined into Haskell via the <code>ShowS</code> type and related functions,
but I’d argue the concrete tree representation is actually clearer and better. The functional
representation is more of a cute trick that allows us to reuse existing functions. Really, <code>ShowS</code>
should have been an abstract type.</p>
<p>Difference lists are an interesting example of how imperative ideas can be incorporated into a
declarative language. That said, difference lists come with some of the downsides of an imperative
approach, namely the lack of persistence.</p>
<p>As far as I’m aware, there isn’t an unambiguous and widely accepted name for this functional
representation. Calling it “functional lists” or something like that is, in my opinion, very
ambiguous and potentially misleading. I think the lack of a good name for this is why “difference
lists” started becoming popular. As I’ve argued, using “difference list” in this context is even
more misleading and confusing.</p>
<p>If people really want a name, one option might be “<strong>delta list</strong>”. I don’t think this term is used.
It keeps the intuitive idea that the functional representation represents some “change” to a list,
a collection of deltas that will all be applied at once, but it doesn’t make any false reference to
difference lists. I’m not super into this name; I just want something that isn’t “difference list”
or otherwise misleading.</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>To be clear, it’s still much, <em>much</em>,
better than using plain concatenation.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>Such a length relation couldn’t
be written in pure Prolog but can in actual Prolog.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>For those algebraically minded, this almost makes <code>concat</code> and <code>empty</code> into another
monoid except <code>concat</code> is partial, but such a partial monoid is just a category! In other words,
we have a category whose objects are lists and whose homsets are, at most, singletons containing
<code>Xs - Ys</code> for Hom(Xs, Ys). If we maintain our invariant that we have <code>Xs - Ys</code> only when <code>Ys</code> is a
suffix of <code>Xs</code>, this thin category is exactly the category corresponding to the reflexive,
transitive “has suffix” relation. We could generalize this to any monoid via a “factors through”
relation, i.e. |\mathrm{Hom}(m, n)| is inhabited if and only if |\exists p. m = pn| which you can
easily prove is a reflexive, transitive relation given the monoid axioms. However, for a general
monoid, we can have a (potentially) non-thin category by saying |p \in \mathrm{Hom}(m,n)| if and
only if |m = pn|. The category will be thin if and only if the monoid is cancellative. This is
exactly the slice category of the monoid viewed as a one-object category.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>Again, in <em>actual</em> Prolog, we <em>could</em>
make a duplicate without consuming the original, though it would still take O(length Xs) time using
the notion of length mentioned before.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>2025-04-12 16:25:24-07:00</pubDate>
    <guid>https://www.hedonisticlearning.com/posts/functional-lists-are-not-difference-lists.html</guid>
    <dc:creator>Derek Elkins</dc:creator>
</item>
<item>
    <title>Classical First-Order Logic from the Perspective of Categorical Logic</title>
    <link>https://www.hedonisticlearning.com/posts/categorical-logic-and-fol.html</link>
    <description><![CDATA[<h2 id="introduction">Introduction</h2>
<p>Classical First-Order Logic (Classical FOL) has an absolutely central place in traditional
logic, model theory, and set theory. It is the foundation upon which <strong>ZF</strong>(<strong>C</strong>), which is itself
often taken as the foundation of mathematics, is built. When classical FOL was being established
there was a lot of study and debate around alternative options. There are a variety of philosophical
and metatheoretic reasons supporting classical FOL as The Right Choice.</p>
<p>This all happened, however, well before category theory was even a twinkle in Mac Lane’s and Eilenberg’s
eyes, and when type theory was taking its first stumbling steps.</p>
<p>My focus in this article is on what classical FOL looks like to a modern categorical logician.
This can be neatly summarized as “classical FOL is the internal logic of
a <a href="https://ncatlab.org/nlab/show/first-order+hyperdoctrine">Boolean First-Order Hyperdoctrine</a>.”
Each of the three words in this term, “Boolean”, “First-Order”, and “Hyperdoctrine”, suggest
a distinct axis in which to vary the (class of categorical models of the) logic. <em>All</em> of them
have compelling categorical motivations to be varied.</p>
<!--more-->
<h2 id="boolean">Boolean</h2>
<p>The first and simplest is the term “Boolean”. This is what differentiates the categorical semantics
of classical (first-order) logic from constructive (first-order) logic. Considering arbitrary
first-order hyperdoctrines would give us a form of intuitionistic first-order logic.</p>
<p>It is fairly rare that the categories categorists are interested in are Boolean. For example, most
toposes, all of which give rise to first-order hyperdoctrines, are not Boolean. The assumption that they
are tends to correspond to a kind of “discreteness” that’s often at odds with the purpose of the
topos. For example, a category of sheaves on a topological space is Boolean if and only if every
open subset of that space is closed. This implies, for example, that such a space is
<a href="https://en.wikipedia.org/wiki/Extremally_disconnected_space">extremally disconnected</a>.</p>
<h2 id="first-order">First-Order</h2>
<p>The next term is the term “first-order”. As the name suggests, a first-order hyperdoctrine has the
necessary structure to interpret first-order logic. The question, then, is what kind of categories
have this structure and only this structure. The answer, as far as I’m aware, is not many.</p>
<p>Many (classes of) categories have the structure to be first-order hyperdoctrines, but often they have
additional structure as well that it seems odd to ignore. The most notable and interesting example is
toposes. All elementary toposes (which includes all Grothendieck toposes) have the structure to give
rise to a first-order hyperdoctrine. But, famously, they also have the structure to give rise to a
higher order logic. Even more interesting, while Grothendieck toposes, being elementary toposes, technically
<em>do</em> support the necessary structure for first-order logic, the natural morphisms of Grothendieck
toposes, geometric morphisms, <em>do not preserve that structure</em>, unlike the logical functors between
elementary toposes.</p>
<p>The natural internal logic for Grothendieck toposes turns out to
be <a href="https://ncatlab.org/nlab/show/geometric+theory">geometric logic</a>. This is a logic that lacks
universal quantification and implication (and thus negation) but does have <em>infinitary</em> disjunction.
This leads to a logic that is, at least superficially, incomparable to first-order logic. Closely
related logics are regular logic and coherent logic which are sub-logics of both geometric
logic and first-order logic.</p>
<p>We see, then, just from the examples of the natural logics of toposes, none of them are first-order
logic, and we get examples that are more powerful, less powerful, and incomparable to first-order
logic. Other common classes of categories give other natural logics, such as the cartesian logic
from left exact categories, and monoidal categories give rise to (ordered) linear logics.
We get the simply typed lambda calculus from cartesian closed categories which leads
to the next topic.</p>
<h2 id="hyperdoctrine">Hyperdoctrine</h2>
<p>A (posetal) hyperdoctrine essentially takes a category and, for each object in that category, assigns to it
a poset of “predicates” on that object. In many cases, this takes the form of the <strong>Sub</strong> functor assigning
to each object its poset of subobjects. Various versions of hyperdoctrines will require additional structure
on the source category, these posets, and/or the functor itself to interpret various logical connectives.
For example, a <a href="https://ncatlab.org/nlab/show/regular+hyperdoctrine">regular hyperdoctrine</a> requires the
source category to have finite limits, the posets to be meet-semilattices, and the functor to give rise
to monotonic functions with left adjoints satisfying certain properties. This notion of hyperdoctrines is
suitable for regular logic.</p>
<p>It’s very easy to recognize that these functors are essentially <a href="https://ncatlab.org/nlab/show/indexed+category">indexed</a>
<a href="https://ncatlab.org/nlab/show/%280%2C1%29-category+theory">|(0,1)|-categories</a>. This immediately
suggests that we should consider higher categorical versions or at the very least normal indexed categories.</p>
<p>What this means for the logic is that we move from proof-irrelevant logic to proof-relevant logic. We now
have potentially multiple ways a “predicate” could “entail” another “predicate”. We can
<a href="https://doi.org/10.1017/CBO9780511525902.008">present the simply typed lambda calculus in this indexed category manner</a>.
This naturally leads/connects to the categorical semantics of type theories.</p>
<p>Pushing forward to |(\infty, 1)|-categories is also fairly natural, as it’s natural to want to
talk about an entailment holding for distinct but “equivalent” reasons.</p>
<h2 id="summary">Summary</h2>
<p>Moving in all three of these directions simultaneously leads pretty naturally to something like
Homotopy Type Theory (HoTT). HoTT is a naturally constructive (but not anti-classical) type theory
aimed at being an internal language for |(\infty, 1)|-toposes.</p>
<h2 id="why-classical-fol">Why Classical FOL?</h2>
<p>Okay, so why did people pick classical FOL in the first place? It’s not like the concept of, say, a
higher-order logic wasn’t considered at the time.</p>
<p>Classical versus Intuitionistic was debated at the time, but at that time it was primarily a philosophical
argument, and the defense of Intuitionism was not very compelling (to me and obviously people at the time).
The focus would probably have been more on (classical) FOL versus second- (or higher-)order logic.</p>
<p>Oversimplifying, the issue with second-order logic is fairly evident from the semantics. There are two
main approaches: Henkin-semantics and full (or standard) semantics. Henkin-semantics keeps the nice
properties of (classical) FOL but fails to get the nice properties, namely categoricity properties,
of second-order logic. This isn’t surprising as Henkin-semantics can be encoded into first-order logic.
It’s essentially syntactic sugar. Full semantics, however, states that the interpretation of predicate
sorts is power sets of (cartesian products of) the domain<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. This leads to massive completeness problems as our metalogical set theory has many, many
ways of building subsets of the domain. There are <a href="https://en.wikipedia.org/wiki/Second-order_logic#Metalogical_results">metatheoretic results</a>
that state that there is no computable set of <em>logical</em> axioms that would give us a sound and
complete theory for second-order logic with respect to full semantics. This aspect is also philosophically
problematic, because we don’t want to need set theory to understand the very formulation of set theory.
Thus Quine’s comment that “second-order logic [was] set theory in sheep’s clothing”.</p>
<p>On the more positive and (meta-)mathematical side, we have results like
<a href="https://ncatlab.org/nlab/show/Lindstr%C3%B6m%27s+theorem">Lindström’s theorem</a> which states that
classical FOL is the strongest logic that simultaneously satisfies (downward) Löwenheim-Skolem
and <a href="https://en.wikipedia.org/wiki/Compactness_theorem">compactness</a>. There’s also a syntactic
result by Lindström which characterizes first-order logic as the only logic having a recursively
enumerable set of tautologies and satisfying Löwenheim-Skolem<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<h3 id="the-catch">The Catch</h3>
<p>There’s one big caveat to the above. All of the above results are formulated in traditional model
theory which means there are various assumptions built in to their statements. In the language of
categorical logic, these assumptions can basically be summed up in the statement that the <em>only</em>
category of semantics that traditional model theory considers is <strong>Set</strong>.</p>
<p>This is an utterly bizarre thing to do from the standpoint of categorical logic.</p>
<p>The issues with full semantics follow directly from this choice. If, as categorical logic would
have us do, we considered <em>every</em> category with sufficient structure as a potential category of
semantics, then our theory would not be forced to follow every nook and cranny of <strong>Set</strong>’s
notion of subset to be complete. Valid formulas would need to be true not only in <strong>Set</strong> but
in wildly different categories, e.g. every (Boolean) topos.</p>
<p>These traditional results are also often very specific to <em>classical</em> FOL. Dropping this constraint
of classical logic would lead to an even broader class of models.</p>
<h3 id="categorical-perspective-on-classical-first-order-logic">Categorical Perspective on Classical First-Order Logic</h3>
<p>A <a href="https://ncatlab.org/nlab/show/Boolean+category">Boolean category</a> is just a <a href="https://ncatlab.org/nlab/show/coherent+category">coherent category</a>
where every object has a complement. Since <a href="https://ncatlab.org/nlab/show/coherent+functor">coherent functors</a>
preserve complements, we have that the category of Boolean categories is a <em>full</em> subcategory of the
category of coherent categories.</p>
<p>One nice thing about, specifically, <em>classical</em> first-order logic from the perspective of category
theory is the following. First, <a href="https://ncatlab.org/nlab/show/coherent+logic">coherent logic</a>
is a sub-logic of geometric logic restricted to finitary disjunction. Via <a href="posts/morleyization.html">Morleyization</a>,
we can encode <em>classical</em> first-order logic into coherent logic such that the
categories of models of each are equivalent. This implies that a classical FOL formula is
valid if and only if its encoding is. Morleyization allows us to analyze classical FOL using
the tools of classifying toposes. On the one hand, this once again suggests the importance
of <em>coherent</em> logic, but it also means that we <em>can</em> use categorical tools with <em>classical</em> FOL.</p>
<h2 id="conclusion">Conclusion</h2>
<p>There are certain things that I and, I believe, most logicians take as table stakes for a
(foundational) logic<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>.
For example, <em>checking</em> a proof should be computably decidable. For these reasons, I am in
complete accord with early (formal) logicians that classical second-order logic with full semantics
<em>is</em> an unacceptably worse alternative to classical first-order logic.</p>
<p>However, when it comes to statements about the specialness of FOL, a lot of them seem to be more
statements about traditional model theory than FOL itself, and also statements about the philosophical
predilections of the time. I feel that philosophical attitudes among logicians and mathematicians
have shifted a decent amount since the beginning of the 20th century. We have different philosophical
predilections today than then, but they are informed by another hundred years of thought, and they are
more relevant to what is being done today.</p>
<p>Martin-Löf type theory (MLTT) and its progeny also present an alternative path with their own philosophical
and metalogical justifications. I mention this to point out actual cases of foundational frameworks
that a (<em>very</em>) superficial reading of traditional model theory results would seem to have been “ruled
out”. Even if one thinks the FOL+<strong>ZFC</strong> (or whatever) is the better foundations, I think it is unreasonable
to assert that MLTT derivatives are unworkable as a foundations.</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>It’s worth mentioning that this is exactly
what categorical logic would suggest: our syntactic power objects should be mapped to semantic power
objects.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>While nice, it’s not clear that
compactness and, especially, Löwenheim-Skolem are sacrosanct properties that we’d be unwilling
to do without. Lindström’s first theorem is thus a nice abstract characterization theorem for
classical FOL, but it doesn’t shut the door on considering alternatives even in the context of
traditional model theory.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>I’m totally fine thinking about logics that lack these properties, but
I would never put any of them forward as an acceptable foundational logic.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>2024-10-24 17:55:55-07:00</pubDate>
    <guid>https://www.hedonisticlearning.com/posts/categorical-logic-and-fol.html</guid>
    <dc:creator>Derek Elkins</dc:creator>
</item>
<item>
    <title>Global Rebuilding, Coroutines, and Defunctionalization</title>
    <link>https://www.hedonisticlearning.com/posts/global-rebuilding-coroutines-and-defunctionalization.html</link>
    <description><![CDATA[<h2 id="introduction">Introduction</h2>
<p>In 1983, Mark Overmars described global rebuilding in <em>The Design of Dynamic Data Structures</em>.
The problem it was aimed at solving was turning the amortized time complexity bounds of
batched rebuilding into worst-case bounds. In <strong>batched rebuilding</strong> we perform a series of
updates to a data structure which may cause the performance of operations to degrade, but
occasionally we expensively rebuild the data structure back into an optimal arrangement.
If the updates don’t degrade performance too much before we rebuild, then we can achieve
our target time complexity bounds in an amortized sense. An update that doesn’t degrade
performance too much is called a <strong>weak update</strong>.</p>
<p>Taking an example from Okasaki’s <em>Purely Functional Data Structures</em>, we can consider a
binary search tree where deletions occur by simply marking the deleted nodes as deleted.
Then, once about half the tree is marked as deleted, we rebuild the tree into a balanced
binary search tree and clean out the nodes marked as deleted at that time. In this case,
the deletions count as weak updates because leaving the deleted nodes in the tree even
when it corresponds to up to half the tree can only mildly impact the time complexity of
other operations. Specifically, assuming the tree was balanced at the start, then deleting
half the nodes could only reduce the tree’s depth by about 1. On the other hand, naive inserts
are <em>not</em> weak updates as they can quickly increase the tree’s depth.</p>
<p>The idea of global rebuilding is relatively straightforward, though how you would actually
realize it in any particular example is not. The overall idea is simply that instead of
waiting until the last moment and then rebuilding the data structure all at once, we’ll start
the rebuild sooner and work at it incrementally as we perform other operations. If we
update the new version faster than we update the original version, we’ll finish it by the
time we would have wanted to perform a batched rebuild, and we can just switch to this new version.</p>
<p>More concretely, though still quite vaguely, <strong>global rebuilding</strong> involves, when a
threshold is reached, rebuilding by creating a new “empty” version of the data
structure called the <em>shadow copy</em>. The original version is the <em>working copy</em>. Work on
rebuilding happens incrementally as operations are performed on the data structure. During
this period, we service queries from the working copy and continue to update it as usual.
Each update needs to make more progress on building the shadow copy than it worsens the
working copy. For example, an insert should insert more nodes into the shadow copy than
the working copy. Once the shadow copy is built, we may still have more work to do to
incorporate changes that occurred after we started the rebuild. To this end, we can
maintain a queue of update operations performed on the working copy since the start of
a rebuild, and then apply these updates, also incrementally, to the shadow copy. Again,
we need to apply the updates from the queue at a fast enough rate so that we will
eventually catch up. Of course, all of this needs to happen fast enough so that 1)
the working copy doesn’t get too degraded before the shadow copy is ready, and 2)
we don’t end up needing to rebuild the shadow copy before it’s ready to do any work.</p>
<!--more-->
<h2 id="coroutines">Coroutines</h2>
<p>Okasaki passingly mentions that global rebuilding “can be usefully viewed as running
the rebuilding transformation as a coroutine”. Also, the situation described above is
quite reminiscent of garbage collection. There the classic half-space stop-the-world
copying collector is naturally the batched rebuilding version. More incremental versions
often have read or write barriers and break the garbage collection into incremental
steps. Garbage collection is also often viewed as two processes coroutining.</p>
<p>The goal of this article is to derive global rebuilding-based data structures from
an expression of them as two coroutining processes. Ideally, we should be able to
take a data structure implemented via batched rebuilding and simply run the batch
rebuilding step as a coroutine. Modifying the data structure’s operations and the
rebuilding step should, in theory, just be a matter of inserting appropriate <code>yield</code>
statements. Of course, it won’t be that easy since the batched version of rebuilding
doesn’t need to worry about concurrent updates to the original data structure.</p>
<p>In theory, such a representation would be a perfectly effective way of articulating
the global rebuilding version of the data structure. That said, I will be using the
standard power move of CPS transforming and defunctionalizing to get a more data
structure-like result.</p>
<p>I’ll implement coroutines as a very simplified case of modeling cooperative concurrency with
continuations. In that context, a “process” written in continuation-passing style
“yields” to the scheduler by passing its continuation to a scheduling function.
Normally, the scheduler would place that continuation at the end of a work queue
and then pick up a continuation from the front of the work queue and invoke it
resuming the previously suspended “process”. In our case, we only have two
“processes” so our “work queue” can just be a single mutable cell. When one
“process” yields, it just swaps its continuation into the cell and the other
“process’” out and invokes the continuation it read.</p>
<p>Since the rebuilding process is always driven by the main process, the pattern
is a bit more like generators. This has the benefit that only the rebuilding
process needs to be written in continuation-passing style. The following is
a very quick and dirty set of functions for this.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Coroutine</span> ( <span class="dt">YieldFn</span>, spawn ) <span class="kw">where</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span> ( join )</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.IORef</span> ( <span class="dt">IORef</span>, newIORef, readIORef, writeIORef )</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">YieldFn</span> <span class="ot">=</span> <span class="dt">IO</span> () <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="ot">yield ::</span> <span class="dt">IORef</span> (<span class="dt">IO</span> ()) <span class="ot">-&gt;</span> <span class="dt">IO</span> () <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>yield <span class="ot">=</span> writeIORef</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="ot">resume ::</span> <span class="dt">IORef</span> (<span class="dt">IO</span> ()) <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>resume <span class="ot">=</span> join <span class="op">.</span> readIORef</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="ot">terminate ::</span> <span class="dt">IORef</span> (<span class="dt">IO</span> ()) <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a>terminate yieldRef <span class="ot">=</span> writeIORef yieldRef (<span class="fu">ioError</span> <span class="op">$</span> <span class="fu">userError</span> <span class="st">&quot;Subprocess completed&quot;</span>)</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a><span class="ot">spawn ::</span> (<span class="dt">YieldFn</span> <span class="ot">-&gt;</span> <span class="dt">IO</span> () <span class="ot">-&gt;</span> <span class="dt">IO</span> ()) <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">IO</span> ())</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a>spawn process <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a>    yieldRef <span class="ot">&lt;-</span> newIORef <span class="fu">undefined</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a>    writeIORef yieldRef <span class="op">$</span> process (yield yieldRef) (terminate yieldRef)</span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span> (resume yieldRef)</span></code></pre></div>
<p>A simple example of usage is:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">process ::</span> <span class="dt">YieldFn</span> <span class="ot">-&gt;</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">IO</span> () <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>process     _ <span class="dv">0</span> k <span class="ot">=</span> k</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>process yield i k <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>    <span class="fu">putStrLn</span> <span class="op">$</span> <span class="st">&quot;Subprocess: &quot;</span> <span class="op">++</span> <span class="fu">show</span> i</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>    yield <span class="op">$</span> process yield (i<span class="op">-</span><span class="dv">1</span>) k</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="ot">example ::</span> <span class="dt">IO</span> ()</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>example <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>    resume <span class="ot">&lt;-</span> spawn <span class="op">$</span> \yield <span class="ot">-&gt;</span> process yield <span class="dv">10</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>    forM_ [(<span class="dv">1</span><span class="ot"> ::</span> <span class="dt">Int</span>) <span class="op">..</span> <span class="dv">10</span>] <span class="op">$</span> \i <span class="ot">-&gt;</span> <span class="kw">do</span></span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a>        <span class="fu">putStrLn</span> <span class="op">$</span> <span class="st">&quot;Main process: &quot;</span> <span class="op">++</span> <span class="fu">show</span> i</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>        resume</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a>    <span class="fu">putStrLn</span> <span class="st">&quot;Main process done&quot;</span></span></code></pre></div>
<p>with output:</p>
<pre><code>Main process: 1
Subprocess: 10
Main process: 2
Subprocess: 9
Main process: 3
Subprocess: 8
Main process: 4
Subprocess: 7
Main process: 5
Subprocess: 6
Main process: 6
Subprocess: 5
Main process: 7
Subprocess: 4
Main process: 8
Subprocess: 3
Main process: 9
Subprocess: 2
Main process: 10
Subprocess: 1
Main process done</code></pre>
<h2 id="queues">Queues</h2>
<p>I’ll use queues since they are very simple and <em>Purely Functional Data Structures</em>
describes Hood-Melville Real-Time Queues in Figure 8.1 as an example of global
rebuilding. We’ll end up with something quite similar which could be made more similar
by changing the rebuilding code. Indeed, the differences are just an artifact of
specific, easily changed details of the rebuilding coroutine, as we’ll see.</p>
<p>The examples I’ll present are mostly imperative, <em>not</em> purely functional. There
are two reasons for this. First, I’m not focused on purely functional data structures
and the technique works fine for imperative data structures. Second, it is arguably
more natural to talk about coroutines in an imperative context. In this case,
it’s easy to adapt the code to a purely functional version since it’s not much
more than a purely functional data structure stuck in an <code>IORef</code>.</p>
<p>For a more imperative structure with mutable linked structure and/or in-place
array updates, it would be more challenging to produce a purely functional
version. The techniques here could still be used, though there are more
“concurrency” concerns. While I don’t include the code here, I did a similar
exercise for a random-access stack (a fancy way of saying a growable array).
There the “concurrency” concern is that the elements you are copying to the
new array may be popped and potentially overwritten before you switch to the
new array. In this case, it’s easy to solve, since if the head pointer of
the live version reaches the source offset for copy, you can just switch to
the new array immediately.</p>
<p>Nevertheless, I can easily imagine scenarios where it may be beneficial, if
not necessary, for the coroutines to communicate more and/or for there to be
multiple “rebuild” processes. The approach used here could be easily adapted
to that. It’s also worth mentioning that even in simpler cases, non-constant-time
operations will either need to invoke <code>resume</code> multiple times or need more
coordination with the “rebuild” process to know when it can do more than a
constant amount of work. This could be accomplished by “rebuild” process
simply recognizing this from the data structure state, or some state could
be explicitly set to indicate this, or the techniques described earlier
could be used, e.g. a different process for non-constant-time operations.</p>
<p>The code below uses the extensions <code>BangPatterns</code>, <code>RecordWildCards</code>, and <code>GADTs</code>.</p>
<h3 id="batched-rebuilding-implementation">Batched Rebuilding Implementation</h3>
<p>We start with the straightforward, amortized constant-time queues where
we push to a stack representing the back of the queue and pop from a stack
representing the front. When the front stack is empty, we need to expensively
reverse the back stack to make a new front stack.</p>
<p>I intentionally separate out the reverse step as an explicit <code>rebuild</code> function.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">BatchedRebuildingQueue</span> ( <span class="dt">Queue</span>, new, enqueue, dequeue ) <span class="kw">where</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.IORef</span> ( <span class="dt">IORef</span>, newIORef, readIORef, writeIORef, modifyIORef )</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Queue</span> a <span class="ot">=</span> <span class="dt">Queue</span> {</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="ot">    queueRef ::</span> <span class="dt">IORef</span> ([a], [a])</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="ot">new ::</span> <span class="dt">IO</span> (<span class="dt">Queue</span> a)</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>new <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>    queueRef <span class="ot">&lt;-</span> newIORef ([], [])</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span> <span class="dt">Queue</span> { <span class="op">..</span> }</span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a><span class="ot">dequeue ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">Maybe</span> a)</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a>dequeue q<span class="op">@</span>(<span class="dt">Queue</span> { <span class="op">..</span> }) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a>    (front, back) <span class="ot">&lt;-</span> readIORef queueRef</span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a>    <span class="kw">case</span> front <span class="kw">of</span></span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a>        (x<span class="op">:</span>front&#39;) <span class="ot">-&gt;</span> <span class="kw">do</span></span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a>            writeIORef queueRef (front&#39;, back)</span>
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a>            <span class="fu">return</span> (<span class="dt">Just</span> x)</span>
<span id="cb4-20"><a href="#cb4-20" aria-hidden="true" tabindex="-1"></a>        [] <span class="ot">-&gt;</span> <span class="kw">case</span> back <span class="kw">of</span></span>
<span id="cb4-21"><a href="#cb4-21" aria-hidden="true" tabindex="-1"></a>                [] <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="dt">Nothing</span></span>
<span id="cb4-22"><a href="#cb4-22" aria-hidden="true" tabindex="-1"></a>                _ <span class="ot">-&gt;</span> rebuild q <span class="op">&gt;&gt;</span> dequeue q</span>
<span id="cb4-23"><a href="#cb4-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-24"><a href="#cb4-24" aria-hidden="true" tabindex="-1"></a><span class="ot">enqueue ::</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb4-25"><a href="#cb4-25" aria-hidden="true" tabindex="-1"></a>enqueue x (<span class="dt">Queue</span> { <span class="op">..</span> }) <span class="ot">=</span></span>
<span id="cb4-26"><a href="#cb4-26" aria-hidden="true" tabindex="-1"></a>    modifyIORef queueRef (\(front, back) <span class="ot">-&gt;</span> (front, x<span class="op">:</span>back))</span>
<span id="cb4-27"><a href="#cb4-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-28"><a href="#cb4-28" aria-hidden="true" tabindex="-1"></a><span class="ot">rebuild ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb4-29"><a href="#cb4-29" aria-hidden="true" tabindex="-1"></a>rebuild (<span class="dt">Queue</span> { <span class="op">..</span> }) <span class="ot">=</span></span>
<span id="cb4-30"><a href="#cb4-30" aria-hidden="true" tabindex="-1"></a>    modifyIORef queueRef (\([], back) <span class="ot">-&gt;</span> (<span class="fu">reverse</span> back, []))</span></code></pre></div>
<h3 id="global-rebuilding-implementation">Global Rebuilding Implementation</h3>
<p>This step is where a modicum of thought is needed. We need to make the
<code>rebuild</code> step from the batched version incremental. This is straightforward,
if tedious, given the coroutine infrastructure. In this case, we incrementalize
the <code>reverse</code> by reimplementing <code>reverse</code> in CPS with some <code>yield</code> calls
inserted. Then we need to incrementalize append. Since we’re not waiting
until <code>front</code> is empty, we’re actually computing <code>front ++ reverse back</code>.
Incrementalizing append is hard, so we actually reverse <code>front</code> and then
use an incremental <code>reverseAppend</code> (which is basically what the incremental
reverse does anyway<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>).</p>
<p>One of first thing to note about this code is that the actual operations are
largely unchanged other than inserting calls to <code>resume</code>. In fact, <code>dequeue</code>
is even simpler than in the batched version as we can just assume that <code>front</code>
is always populated when the queue is not empty. <code>dequeue</code> is freed from the
responsibility of deciding when to trigger a rebuild. Most of the bulk of
this code is from reimplementing a <code>reverseAppend</code> function (twice).</p>
<p>The parts of this code that require some deeper though are 1) knowing when
a rebuild should begin, 2) knowing how “fast” the incremental operations
should go<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>
(e.g. <code>incrementalReverse</code> does two steps at a time and the
Hood-Melville implementation has an explicit <code>exec2</code> that does two steps
at a time), and 3) dealing with “concurrent” changes.</p>
<p>For the last, Overmars describes a queue of deferred operations to perform
on the shadow copy once it finishes rebuilding. This kind of suggests a
situation where the “rebuild” process can reference some “snapshot” of
the data structure. In our case, that is the situation we’re in, since
our data structures are essentially immutable data structures in an <code>IORef</code>.
However, it can easily not be the case, e.g. the random-access stack.
Also, this operation queue approach can easily be inefficient and inelegant.
None of the implementations below will have this queue of deferred operations.
It is easier, more efficient, and more elegant to just not copy over parts of
the queue that have been dequeued, rather than have an extra phase of the
rebuilding that just pops off the elements of the <code>front</code> stack that we just
pushed. A similar situation happens for the random-access stack.</p>
<p>The use of <code>drop</code> could probably be easily eliminated. (I’m not even sure it’s
still necessary.) It is mostly an artifact of (not) dealing with off-by-one issues.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">GlobalRebuildingQueue</span> ( <span class="dt">Queue</span>, new, dequeue, enqueue ) <span class="kw">where</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.IORef</span> ( <span class="dt">IORef</span>, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef&#39; )</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Coroutine</span> ( <span class="dt">YieldFn</span>, spawn )</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Queue</span> a <span class="ot">=</span> <span class="dt">Queue</span> {</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="ot">    resume ::</span> <span class="dt">IO</span> (),</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="ot">    frontRef ::</span> <span class="dt">IORef</span> [a],</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="ot">    backRef ::</span> <span class="dt">IORef</span> [a],</span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a><span class="ot">    frontCountRef ::</span> <span class="dt">IORef</span> <span class="dt">Int</span>,</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a><span class="ot">    backCountRef ::</span> <span class="dt">IORef</span> <span class="dt">Int</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a><span class="ot">new ::</span> <span class="dt">IO</span> (<span class="dt">Queue</span> a)</span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a>new <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a>    frontRef <span class="ot">&lt;-</span> newIORef []</span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a>    backRef <span class="ot">&lt;-</span> newIORef []</span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a>    frontCountRef <span class="ot">&lt;-</span> newIORef <span class="dv">0</span></span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a>    backCountRef <span class="ot">&lt;-</span> newIORef <span class="dv">0</span></span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a>    resume <span class="ot">&lt;-</span> spawn <span class="op">$</span> <span class="fu">const</span> <span class="op">.</span> rebuild frontRef backRef frontCountRef backCountRef</span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span> <span class="dt">Queue</span> { <span class="op">..</span> }</span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a><span class="ot">dequeue ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">Maybe</span> a)</span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a>dequeue q <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a>    resume q</span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a>    front <span class="ot">&lt;-</span> readIORef (frontRef q)</span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a>    <span class="kw">case</span> front <span class="kw">of</span></span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a>        [] <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="dt">Nothing</span></span>
<span id="cb5-28"><a href="#cb5-28" aria-hidden="true" tabindex="-1"></a>        (x<span class="op">:</span>front&#39;) <span class="ot">-&gt;</span> <span class="kw">do</span></span>
<span id="cb5-29"><a href="#cb5-29" aria-hidden="true" tabindex="-1"></a>            modifyIORef&#39; (frontCountRef q) <span class="fu">pred</span></span>
<span id="cb5-30"><a href="#cb5-30" aria-hidden="true" tabindex="-1"></a>            writeIORef (frontRef q) front&#39;</span>
<span id="cb5-31"><a href="#cb5-31" aria-hidden="true" tabindex="-1"></a>            <span class="fu">return</span> (<span class="dt">Just</span> x)</span>
<span id="cb5-32"><a href="#cb5-32" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-33"><a href="#cb5-33" aria-hidden="true" tabindex="-1"></a><span class="ot">enqueue ::</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb5-34"><a href="#cb5-34" aria-hidden="true" tabindex="-1"></a>enqueue x q <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb5-35"><a href="#cb5-35" aria-hidden="true" tabindex="-1"></a>    modifyIORef (backRef q) (x<span class="op">:</span>)</span>
<span id="cb5-36"><a href="#cb5-36" aria-hidden="true" tabindex="-1"></a>    modifyIORef&#39; (backCountRef q) <span class="fu">succ</span></span>
<span id="cb5-37"><a href="#cb5-37" aria-hidden="true" tabindex="-1"></a>    resume q</span>
<span id="cb5-38"><a href="#cb5-38" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-39"><a href="#cb5-39" aria-hidden="true" tabindex="-1"></a><span class="ot">rebuild ::</span> <span class="dt">IORef</span> [a] <span class="ot">-&gt;</span> <span class="dt">IORef</span> [a] <span class="ot">-&gt;</span> <span class="dt">IORef</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">IORef</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">YieldFn</span> <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb5-40"><a href="#cb5-40" aria-hidden="true" tabindex="-1"></a>rebuild frontRef backRef frontCountRef backCountRef yield <span class="ot">=</span> <span class="kw">let</span> k <span class="ot">=</span> go k <span class="kw">in</span> go k <span class="kw">where</span></span>
<span id="cb5-41"><a href="#cb5-41" aria-hidden="true" tabindex="-1"></a>  go k <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb5-42"><a href="#cb5-42" aria-hidden="true" tabindex="-1"></a>    frontCount <span class="ot">&lt;-</span> readIORef frontCountRef</span>
<span id="cb5-43"><a href="#cb5-43" aria-hidden="true" tabindex="-1"></a>    backCount <span class="ot">&lt;-</span> readIORef backCountRef</span>
<span id="cb5-44"><a href="#cb5-44" aria-hidden="true" tabindex="-1"></a>    <span class="kw">if</span> backCount <span class="op">&gt;</span> frontCount <span class="kw">then</span> <span class="kw">do</span></span>
<span id="cb5-45"><a href="#cb5-45" aria-hidden="true" tabindex="-1"></a>        back <span class="ot">&lt;-</span> readIORef backRef</span>
<span id="cb5-46"><a href="#cb5-46" aria-hidden="true" tabindex="-1"></a>        front <span class="ot">&lt;-</span> readIORef frontRef</span>
<span id="cb5-47"><a href="#cb5-47" aria-hidden="true" tabindex="-1"></a>        writeIORef backRef []</span>
<span id="cb5-48"><a href="#cb5-48" aria-hidden="true" tabindex="-1"></a>        writeIORef backCountRef <span class="dv">0</span></span>
<span id="cb5-49"><a href="#cb5-49" aria-hidden="true" tabindex="-1"></a>        incrementalReverse back [] <span class="op">$</span> \rback <span class="ot">-&gt;</span></span>
<span id="cb5-50"><a href="#cb5-50" aria-hidden="true" tabindex="-1"></a>            incrementalReverse front [] <span class="op">$</span> \rfront <span class="ot">-&gt;</span></span>
<span id="cb5-51"><a href="#cb5-51" aria-hidden="true" tabindex="-1"></a>                incrementalRevAppend rfront rback <span class="dv">0</span> backCount k</span>
<span id="cb5-52"><a href="#cb5-52" aria-hidden="true" tabindex="-1"></a>      <span class="kw">else</span> <span class="kw">do</span></span>
<span id="cb5-53"><a href="#cb5-53" aria-hidden="true" tabindex="-1"></a>        yield k</span>
<span id="cb5-54"><a href="#cb5-54" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-55"><a href="#cb5-55" aria-hidden="true" tabindex="-1"></a>  incrementalReverse [] acc k <span class="ot">=</span> k acc</span>
<span id="cb5-56"><a href="#cb5-56" aria-hidden="true" tabindex="-1"></a>  incrementalReverse [x] acc k <span class="ot">=</span> k (x<span class="op">:</span>acc)</span>
<span id="cb5-57"><a href="#cb5-57" aria-hidden="true" tabindex="-1"></a>  incrementalReverse (x<span class="op">:</span>y<span class="op">:</span>xs) acc k <span class="ot">=</span> yield <span class="op">$</span> incrementalReverse xs (y<span class="op">:</span>x<span class="op">:</span>acc) k</span>
<span id="cb5-58"><a href="#cb5-58" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-59"><a href="#cb5-59" aria-hidden="true" tabindex="-1"></a>  incrementalRevAppend [] front <span class="op">!</span>movedCount backCount&#39; k <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb5-60"><a href="#cb5-60" aria-hidden="true" tabindex="-1"></a>    writeIORef frontRef front</span>
<span id="cb5-61"><a href="#cb5-61" aria-hidden="true" tabindex="-1"></a>    writeIORef frontCountRef <span class="op">$!</span> movedCount <span class="op">+</span> backCount&#39;</span>
<span id="cb5-62"><a href="#cb5-62" aria-hidden="true" tabindex="-1"></a>    yield k</span>
<span id="cb5-63"><a href="#cb5-63" aria-hidden="true" tabindex="-1"></a>  incrementalRevAppend (x<span class="op">:</span>rfront) acc <span class="op">!</span>movedCount backCount&#39; k <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb5-64"><a href="#cb5-64" aria-hidden="true" tabindex="-1"></a>    currentFrontCount <span class="ot">&lt;-</span> readIORef frontCountRef</span>
<span id="cb5-65"><a href="#cb5-65" aria-hidden="true" tabindex="-1"></a>    <span class="kw">if</span> currentFrontCount <span class="op">&lt;=</span> movedCount <span class="kw">then</span> <span class="kw">do</span></span>
<span id="cb5-66"><a href="#cb5-66" aria-hidden="true" tabindex="-1"></a>        <span class="co">-- This drop count should be bounded by a constant.</span></span>
<span id="cb5-67"><a href="#cb5-67" aria-hidden="true" tabindex="-1"></a>        writeIORef frontRef <span class="op">$!</span> <span class="fu">drop</span> (movedCount <span class="op">-</span> currentFrontCount) acc</span>
<span id="cb5-68"><a href="#cb5-68" aria-hidden="true" tabindex="-1"></a>        writeIORef frontCountRef <span class="op">$!</span> currentFrontCount <span class="op">+</span> backCount&#39;</span>
<span id="cb5-69"><a href="#cb5-69" aria-hidden="true" tabindex="-1"></a>        yield k</span>
<span id="cb5-70"><a href="#cb5-70" aria-hidden="true" tabindex="-1"></a>      <span class="kw">else</span> <span class="kw">if</span> <span class="fu">null</span> rfront <span class="kw">then</span></span>
<span id="cb5-71"><a href="#cb5-71" aria-hidden="true" tabindex="-1"></a>        incrementalRevAppend [] (x<span class="op">:</span>acc) (movedCount <span class="op">+</span> <span class="dv">1</span>) backCount&#39; k</span>
<span id="cb5-72"><a href="#cb5-72" aria-hidden="true" tabindex="-1"></a>      <span class="kw">else</span></span>
<span id="cb5-73"><a href="#cb5-73" aria-hidden="true" tabindex="-1"></a>        yield <span class="op">$!</span> incrementalRevAppend rfront (x<span class="op">:</span>acc) (movedCount <span class="op">+</span> <span class="dv">1</span>) backCount&#39; k</span></code></pre></div>
<h3 id="defunctionalized-global-rebuilding-implementation">Defunctionalized Global Rebuilding Implementation</h3>
<p>This step is completely mechanical.</p>
<p>There’s arguably no reason to defunctionalize. It produces a result that
is more data-structure-like, but, unless you need the code to work in a
first-order language, there’s nothing really gained by doing this. It does
lead to a result that is more directly comparable to other implementations.</p>
<p>For some data structures, having the continuation be analyzable would
provide a simple means for the coroutines to communicate. The main process
could directly look at the continuation to determine its state, e.g. if
a rebuild is in-progress at all. The main process could also directly
manipulate the stored continutation to change the “rebuild” process’
behavior. That said, doing this would mean that we’re not <em>deriving</em>
the implementation. Still, the opportunity for additional optimizations
and simplifications is nice.</p>
<p>As a minor aside, while it is, of course, obvious from looking at the
previous version of the code, it’s neat how the <code>Kont</code> data type
implies that the call stack is bounded and that most calls are tail calls.
<code>REVERSE_STEP</code> is the only constructor that contains a <code>Kont</code> argument,
but its type means that that argument can’t itself be a <code>REVERSE_STEP</code>.
Again, I just find it neat how defunctionalization makes this concrete
and explicit.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">DefunctionalizedQueue</span> ( <span class="dt">Queue</span>, new, dequeue, enqueue ) <span class="kw">where</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.IORef</span> ( <span class="dt">IORef</span>, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef&#39; )</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Kont</span> a r <span class="kw">where</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>  <span class="dt">IDLE</span><span class="ot"> ::</span> <span class="dt">Kont</span> a ()</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a>  <span class="dt">REVERSE_STEP</span><span class="ot"> ::</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">Kont</span> a [a] <span class="ot">-&gt;</span> <span class="dt">Kont</span> a ()</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a>  <span class="dt">REVERSE_FRONT</span><span class="ot"> ::</span> [a] <span class="ot">-&gt;</span> <span class="op">!</span><span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Kont</span> a [a]</span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a>  <span class="dt">REV_APPEND_START</span><span class="ot"> ::</span> [a] <span class="ot">-&gt;</span> <span class="op">!</span><span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Kont</span> a [a]</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a>  <span class="dt">REV_APPEND_STEP</span><span class="ot"> ::</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="op">!</span><span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="op">!</span><span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Kont</span> a ()</span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a><span class="ot">applyKont ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">Kont</span> a r <span class="ot">-&gt;</span> r <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a>applyKont q <span class="dt">IDLE</span> _ <span class="ot">=</span> rebuildLoop q</span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a>applyKont q (<span class="dt">REVERSE_STEP</span> xs acc k) _ <span class="ot">=</span> incrementalReverse q xs acc k</span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a>applyKont q (<span class="dt">REVERSE_FRONT</span> front backCount) rback <span class="ot">=</span></span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a>    incrementalReverse q front [] <span class="op">$</span> <span class="dt">REV_APPEND_START</span> rback backCount</span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a>applyKont q (<span class="dt">REV_APPEND_START</span> rback backCount) rfront <span class="ot">=</span></span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a>    incrementalRevAppend q rfront rback <span class="dv">0</span> backCount</span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a>applyKont q (<span class="dt">REV_APPEND_STEP</span> rfront acc movedCount backCount) _ <span class="ot">=</span></span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a>    incrementalRevAppend q rfront acc movedCount backCount</span>
<span id="cb6-20"><a href="#cb6-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-21"><a href="#cb6-21" aria-hidden="true" tabindex="-1"></a><span class="ot">rebuildLoop ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb6-22"><a href="#cb6-22" aria-hidden="true" tabindex="-1"></a>rebuildLoop q<span class="op">@</span>(<span class="dt">Queue</span> { <span class="op">..</span> }) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb6-23"><a href="#cb6-23" aria-hidden="true" tabindex="-1"></a>    frontCount <span class="ot">&lt;-</span> readIORef frontCountRef</span>
<span id="cb6-24"><a href="#cb6-24" aria-hidden="true" tabindex="-1"></a>    backCount <span class="ot">&lt;-</span> readIORef backCountRef</span>
<span id="cb6-25"><a href="#cb6-25" aria-hidden="true" tabindex="-1"></a>    <span class="kw">if</span> backCount <span class="op">&gt;</span> frontCount <span class="kw">then</span> <span class="kw">do</span></span>
<span id="cb6-26"><a href="#cb6-26" aria-hidden="true" tabindex="-1"></a>        back <span class="ot">&lt;-</span> readIORef backRef</span>
<span id="cb6-27"><a href="#cb6-27" aria-hidden="true" tabindex="-1"></a>        front <span class="ot">&lt;-</span> readIORef frontRef</span>
<span id="cb6-28"><a href="#cb6-28" aria-hidden="true" tabindex="-1"></a>        writeIORef backRef []</span>
<span id="cb6-29"><a href="#cb6-29" aria-hidden="true" tabindex="-1"></a>        writeIORef backCountRef <span class="dv">0</span></span>
<span id="cb6-30"><a href="#cb6-30" aria-hidden="true" tabindex="-1"></a>        incrementalReverse q back [] <span class="op">$</span> <span class="dt">REVERSE_FRONT</span> front backCount</span>
<span id="cb6-31"><a href="#cb6-31" aria-hidden="true" tabindex="-1"></a>      <span class="kw">else</span> <span class="kw">do</span></span>
<span id="cb6-32"><a href="#cb6-32" aria-hidden="true" tabindex="-1"></a>        writeIORef resumeRef <span class="dt">IDLE</span></span>
<span id="cb6-33"><a href="#cb6-33" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-34"><a href="#cb6-34" aria-hidden="true" tabindex="-1"></a><span class="ot">incrementalReverse ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">Kont</span> a [a] <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb6-35"><a href="#cb6-35" aria-hidden="true" tabindex="-1"></a>incrementalReverse q [] acc k <span class="ot">=</span> applyKont q k acc</span>
<span id="cb6-36"><a href="#cb6-36" aria-hidden="true" tabindex="-1"></a>incrementalReverse q [x] acc k <span class="ot">=</span> applyKont q k (x<span class="op">:</span>acc)</span>
<span id="cb6-37"><a href="#cb6-37" aria-hidden="true" tabindex="-1"></a>incrementalReverse q (x<span class="op">:</span>y<span class="op">:</span>xs) acc k <span class="ot">=</span> writeIORef (resumeRef q) <span class="op">$</span> <span class="dt">REVERSE_STEP</span> xs (y<span class="op">:</span>x<span class="op">:</span>acc) k</span>
<span id="cb6-38"><a href="#cb6-38" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-39"><a href="#cb6-39" aria-hidden="true" tabindex="-1"></a><span class="ot">incrementalRevAppend ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb6-40"><a href="#cb6-40" aria-hidden="true" tabindex="-1"></a>incrementalRevAppend (<span class="dt">Queue</span> { <span class="op">..</span> }) [] front <span class="op">!</span>movedCount backCount&#39; <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb6-41"><a href="#cb6-41" aria-hidden="true" tabindex="-1"></a>    writeIORef frontRef front</span>
<span id="cb6-42"><a href="#cb6-42" aria-hidden="true" tabindex="-1"></a>    writeIORef frontCountRef <span class="op">$!</span> movedCount <span class="op">+</span> backCount&#39;</span>
<span id="cb6-43"><a href="#cb6-43" aria-hidden="true" tabindex="-1"></a>    writeIORef resumeRef <span class="dt">IDLE</span></span>
<span id="cb6-44"><a href="#cb6-44" aria-hidden="true" tabindex="-1"></a>incrementalRevAppend q<span class="op">@</span>(<span class="dt">Queue</span> { <span class="op">..</span> }) (x<span class="op">:</span>rfront) acc <span class="op">!</span>movedCount backCount&#39; <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb6-45"><a href="#cb6-45" aria-hidden="true" tabindex="-1"></a>    currentFrontCount <span class="ot">&lt;-</span> readIORef frontCountRef</span>
<span id="cb6-46"><a href="#cb6-46" aria-hidden="true" tabindex="-1"></a>    <span class="kw">if</span> currentFrontCount <span class="op">&lt;=</span> movedCount <span class="kw">then</span> <span class="kw">do</span></span>
<span id="cb6-47"><a href="#cb6-47" aria-hidden="true" tabindex="-1"></a>        <span class="co">-- This drop count should be bounded by a constant.</span></span>
<span id="cb6-48"><a href="#cb6-48" aria-hidden="true" tabindex="-1"></a>        writeIORef frontRef <span class="op">$!</span> <span class="fu">drop</span> (movedCount <span class="op">-</span> currentFrontCount) acc</span>
<span id="cb6-49"><a href="#cb6-49" aria-hidden="true" tabindex="-1"></a>        writeIORef frontCountRef <span class="op">$!</span> currentFrontCount <span class="op">+</span> backCount&#39;</span>
<span id="cb6-50"><a href="#cb6-50" aria-hidden="true" tabindex="-1"></a>        writeIORef resumeRef <span class="dt">IDLE</span></span>
<span id="cb6-51"><a href="#cb6-51" aria-hidden="true" tabindex="-1"></a>      <span class="kw">else</span> <span class="kw">if</span> <span class="fu">null</span> rfront <span class="kw">then</span></span>
<span id="cb6-52"><a href="#cb6-52" aria-hidden="true" tabindex="-1"></a>        incrementalRevAppend q [] (x<span class="op">:</span>acc) (movedCount <span class="op">+</span> <span class="dv">1</span>) backCount&#39;</span>
<span id="cb6-53"><a href="#cb6-53" aria-hidden="true" tabindex="-1"></a>      <span class="kw">else</span></span>
<span id="cb6-54"><a href="#cb6-54" aria-hidden="true" tabindex="-1"></a>        writeIORef resumeRef <span class="op">$!</span> <span class="dt">REV_APPEND_STEP</span> rfront (x<span class="op">:</span>acc) (movedCount <span class="op">+</span> <span class="dv">1</span>) backCount&#39;</span>
<span id="cb6-55"><a href="#cb6-55" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-56"><a href="#cb6-56" aria-hidden="true" tabindex="-1"></a><span class="ot">resume ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb6-57"><a href="#cb6-57" aria-hidden="true" tabindex="-1"></a>resume q <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb6-58"><a href="#cb6-58" aria-hidden="true" tabindex="-1"></a>    kont <span class="ot">&lt;-</span> readIORef (resumeRef q)</span>
<span id="cb6-59"><a href="#cb6-59" aria-hidden="true" tabindex="-1"></a>    applyKont q kont ()</span>
<span id="cb6-60"><a href="#cb6-60" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-61"><a href="#cb6-61" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Queue</span> a <span class="ot">=</span> <span class="dt">Queue</span> {</span>
<span id="cb6-62"><a href="#cb6-62" aria-hidden="true" tabindex="-1"></a><span class="ot">    resumeRef ::</span> <span class="dt">IORef</span> (<span class="dt">Kont</span> a ()),</span>
<span id="cb6-63"><a href="#cb6-63" aria-hidden="true" tabindex="-1"></a><span class="ot">    frontRef ::</span> <span class="dt">IORef</span> [a],</span>
<span id="cb6-64"><a href="#cb6-64" aria-hidden="true" tabindex="-1"></a><span class="ot">    backRef ::</span> <span class="dt">IORef</span> [a],</span>
<span id="cb6-65"><a href="#cb6-65" aria-hidden="true" tabindex="-1"></a><span class="ot">    frontCountRef ::</span> <span class="dt">IORef</span> <span class="dt">Int</span>,</span>
<span id="cb6-66"><a href="#cb6-66" aria-hidden="true" tabindex="-1"></a><span class="ot">    backCountRef ::</span> <span class="dt">IORef</span> <span class="dt">Int</span></span>
<span id="cb6-67"><a href="#cb6-67" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb6-68"><a href="#cb6-68" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-69"><a href="#cb6-69" aria-hidden="true" tabindex="-1"></a><span class="ot">new ::</span> <span class="dt">IO</span> (<span class="dt">Queue</span> a)</span>
<span id="cb6-70"><a href="#cb6-70" aria-hidden="true" tabindex="-1"></a>new <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb6-71"><a href="#cb6-71" aria-hidden="true" tabindex="-1"></a>    frontRef <span class="ot">&lt;-</span> newIORef []</span>
<span id="cb6-72"><a href="#cb6-72" aria-hidden="true" tabindex="-1"></a>    backRef <span class="ot">&lt;-</span> newIORef []</span>
<span id="cb6-73"><a href="#cb6-73" aria-hidden="true" tabindex="-1"></a>    frontCountRef <span class="ot">&lt;-</span> newIORef <span class="dv">0</span></span>
<span id="cb6-74"><a href="#cb6-74" aria-hidden="true" tabindex="-1"></a>    backCountRef <span class="ot">&lt;-</span> newIORef <span class="dv">0</span></span>
<span id="cb6-75"><a href="#cb6-75" aria-hidden="true" tabindex="-1"></a>    resumeRef <span class="ot">&lt;-</span> newIORef <span class="dt">IDLE</span></span>
<span id="cb6-76"><a href="#cb6-76" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span> <span class="dt">Queue</span> { <span class="op">..</span> }</span>
<span id="cb6-77"><a href="#cb6-77" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-78"><a href="#cb6-78" aria-hidden="true" tabindex="-1"></a><span class="ot">dequeue ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">Maybe</span> a)</span>
<span id="cb6-79"><a href="#cb6-79" aria-hidden="true" tabindex="-1"></a>dequeue q  <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb6-80"><a href="#cb6-80" aria-hidden="true" tabindex="-1"></a>    resume q</span>
<span id="cb6-81"><a href="#cb6-81" aria-hidden="true" tabindex="-1"></a>    front <span class="ot">&lt;-</span> readIORef (frontRef q)</span>
<span id="cb6-82"><a href="#cb6-82" aria-hidden="true" tabindex="-1"></a>    <span class="kw">case</span> front <span class="kw">of</span></span>
<span id="cb6-83"><a href="#cb6-83" aria-hidden="true" tabindex="-1"></a>        [] <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="dt">Nothing</span></span>
<span id="cb6-84"><a href="#cb6-84" aria-hidden="true" tabindex="-1"></a>        (x<span class="op">:</span>front&#39;) <span class="ot">-&gt;</span> <span class="kw">do</span></span>
<span id="cb6-85"><a href="#cb6-85" aria-hidden="true" tabindex="-1"></a>            modifyIORef&#39; (frontCountRef q) <span class="fu">pred</span></span>
<span id="cb6-86"><a href="#cb6-86" aria-hidden="true" tabindex="-1"></a>            writeIORef (frontRef q) front&#39;</span>
<span id="cb6-87"><a href="#cb6-87" aria-hidden="true" tabindex="-1"></a>            <span class="fu">return</span> (<span class="dt">Just</span> x)</span>
<span id="cb6-88"><a href="#cb6-88" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-89"><a href="#cb6-89" aria-hidden="true" tabindex="-1"></a><span class="ot">enqueue ::</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb6-90"><a href="#cb6-90" aria-hidden="true" tabindex="-1"></a>enqueue x q <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb6-91"><a href="#cb6-91" aria-hidden="true" tabindex="-1"></a>    modifyIORef (backRef q) (x<span class="op">:</span>)</span>
<span id="cb6-92"><a href="#cb6-92" aria-hidden="true" tabindex="-1"></a>    modifyIORef&#39; (backCountRef q) <span class="fu">succ</span></span>
<span id="cb6-93"><a href="#cb6-93" aria-hidden="true" tabindex="-1"></a>    resume q</span></code></pre></div>
<h3 id="functional-defunctionalized-global-rebuilding-implementation">Functional Defunctionalized Global Rebuilding Implementation</h3>
<p>This is just a straightforward reorganization of the previous code into purely
functional code. This produces a persistent queue with worst-case constant
time operations.</p>
<p>It is, of course, far uglier and more ad-hoc than Okasaki’s
extremely elegant real-time queues, but the methodology to derive it was
simple-minded. The result is also quite similar to the <a href="#hood-melville-implementation">Hood-Melville Queues</a>
even though I did not set out to achieve that. That said, I’m pretty
confident you could derive pretty much <em>exactly</em> the Hood-Melville queues
with just minor modifications to <a href="#global-rebuilding-implementation">Global Rebuilding Implementation</a>.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">FunctionalQueue</span> ( <span class="dt">Queue</span>, empty, dequeue, enqueue ) <span class="kw">where</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Kont</span> a r <span class="kw">where</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>  <span class="dt">IDLE</span><span class="ot"> ::</span> <span class="dt">Kont</span> a ()</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>  <span class="dt">REVERSE_STEP</span><span class="ot"> ::</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">Kont</span> a [a] <span class="ot">-&gt;</span> <span class="dt">Kont</span> a ()</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>  <span class="dt">REVERSE_FRONT</span><span class="ot"> ::</span> [a] <span class="ot">-&gt;</span> <span class="op">!</span><span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Kont</span> a [a]</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>  <span class="dt">REV_APPEND_START</span><span class="ot"> ::</span> [a] <span class="ot">-&gt;</span> <span class="op">!</span><span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Kont</span> a [a]</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>  <span class="dt">REV_APPEND_STEP</span><span class="ot"> ::</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="op">!</span><span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="op">!</span><span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Kont</span> a ()</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a><span class="ot">applyKont ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">Kont</span> a r <span class="ot">-&gt;</span> r <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a>applyKont q <span class="dt">IDLE</span> _ <span class="ot">=</span> rebuildLoop q</span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a>applyKont q (<span class="dt">REVERSE_STEP</span> xs acc k) _ <span class="ot">=</span> incrementalReverse q xs acc k</span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a>applyKont q (<span class="dt">REVERSE_FRONT</span> front backCount) rback <span class="ot">=</span></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a>    incrementalReverse q front [] <span class="op">$</span> <span class="dt">REV_APPEND_START</span> rback backCount</span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a>applyKont q (<span class="dt">REV_APPEND_START</span> rback backCount) rfront <span class="ot">=</span></span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a>    incrementalRevAppend q rfront rback <span class="dv">0</span> backCount</span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a>applyKont q (<span class="dt">REV_APPEND_STEP</span> rfront acc movedCount backCount) _ <span class="ot">=</span></span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a>    incrementalRevAppend q rfront acc movedCount backCount</span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a><span class="ot">rebuildLoop ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a>rebuildLoop q<span class="op">@</span>(<span class="dt">Queue</span> { <span class="op">..</span> }) <span class="ot">=</span></span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a>    <span class="kw">if</span> backCount <span class="op">&gt;</span> frontCount <span class="kw">then</span></span>
<span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a>        <span class="kw">let</span> q&#39; <span class="ot">=</span> q { back <span class="ot">=</span> [], backCount <span class="ot">=</span> <span class="dv">0</span> } <span class="kw">in</span></span>
<span id="cb7-24"><a href="#cb7-24" aria-hidden="true" tabindex="-1"></a>        incrementalReverse q&#39; back [] <span class="op">$</span> <span class="dt">REVERSE_FRONT</span> front backCount</span>
<span id="cb7-25"><a href="#cb7-25" aria-hidden="true" tabindex="-1"></a>      <span class="kw">else</span></span>
<span id="cb7-26"><a href="#cb7-26" aria-hidden="true" tabindex="-1"></a>        q { resumeKont <span class="ot">=</span> <span class="dt">IDLE</span> }</span>
<span id="cb7-27"><a href="#cb7-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-28"><a href="#cb7-28" aria-hidden="true" tabindex="-1"></a><span class="ot">incrementalReverse ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">Kont</span> a [a] <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb7-29"><a href="#cb7-29" aria-hidden="true" tabindex="-1"></a>incrementalReverse q [] acc k <span class="ot">=</span> applyKont q k acc</span>
<span id="cb7-30"><a href="#cb7-30" aria-hidden="true" tabindex="-1"></a>incrementalReverse q [x] acc k <span class="ot">=</span> applyKont q k (x<span class="op">:</span>acc)</span>
<span id="cb7-31"><a href="#cb7-31" aria-hidden="true" tabindex="-1"></a>incrementalReverse q (x<span class="op">:</span>y<span class="op">:</span>xs) acc k <span class="ot">=</span> q { resumeKont <span class="ot">=</span> <span class="dt">REVERSE_STEP</span> xs (y<span class="op">:</span>x<span class="op">:</span>acc) k }</span>
<span id="cb7-32"><a href="#cb7-32" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-33"><a href="#cb7-33" aria-hidden="true" tabindex="-1"></a><span class="ot">incrementalRevAppend ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb7-34"><a href="#cb7-34" aria-hidden="true" tabindex="-1"></a>incrementalRevAppend q [] front&#39; <span class="op">!</span>movedCount backCount&#39; <span class="ot">=</span></span>
<span id="cb7-35"><a href="#cb7-35" aria-hidden="true" tabindex="-1"></a>    q { front <span class="ot">=</span> front&#39;, frontCount <span class="ot">=</span> movedCount <span class="op">+</span> backCount&#39;, resumeKont <span class="ot">=</span> <span class="dt">IDLE</span> }</span>
<span id="cb7-36"><a href="#cb7-36" aria-hidden="true" tabindex="-1"></a>incrementalRevAppend q (x<span class="op">:</span>rfront) acc <span class="op">!</span>movedCount backCount&#39; <span class="ot">=</span></span>
<span id="cb7-37"><a href="#cb7-37" aria-hidden="true" tabindex="-1"></a>    <span class="kw">if</span> frontCount q <span class="op">&lt;=</span> movedCount <span class="kw">then</span></span>
<span id="cb7-38"><a href="#cb7-38" aria-hidden="true" tabindex="-1"></a>        <span class="co">-- This drop count should be bounded by a constant.</span></span>
<span id="cb7-39"><a href="#cb7-39" aria-hidden="true" tabindex="-1"></a>        <span class="kw">let</span> <span class="op">!</span>front <span class="ot">=</span> <span class="fu">drop</span> (movedCount <span class="op">-</span> frontCount q) acc <span class="kw">in</span></span>
<span id="cb7-40"><a href="#cb7-40" aria-hidden="true" tabindex="-1"></a>        q { front <span class="ot">=</span> front, frontCount <span class="ot">=</span> frontCount q <span class="op">+</span> backCount&#39;, resumeKont <span class="ot">=</span> <span class="dt">IDLE</span> }</span>
<span id="cb7-41"><a href="#cb7-41" aria-hidden="true" tabindex="-1"></a>      <span class="kw">else</span> <span class="kw">if</span> <span class="fu">null</span> rfront <span class="kw">then</span></span>
<span id="cb7-42"><a href="#cb7-42" aria-hidden="true" tabindex="-1"></a>        incrementalRevAppend q [] (x<span class="op">:</span>acc) (movedCount <span class="op">+</span> <span class="dv">1</span>) backCount&#39;</span>
<span id="cb7-43"><a href="#cb7-43" aria-hidden="true" tabindex="-1"></a>      <span class="kw">else</span></span>
<span id="cb7-44"><a href="#cb7-44" aria-hidden="true" tabindex="-1"></a>        q { resumeKont <span class="ot">=</span> <span class="dt">REV_APPEND_STEP</span> rfront (x<span class="op">:</span>acc) (movedCount <span class="op">+</span> <span class="dv">1</span>) backCount&#39; }</span>
<span id="cb7-45"><a href="#cb7-45" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-46"><a href="#cb7-46" aria-hidden="true" tabindex="-1"></a><span class="ot">resume ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb7-47"><a href="#cb7-47" aria-hidden="true" tabindex="-1"></a>resume q <span class="ot">=</span> applyKont q (resumeKont q) ()</span>
<span id="cb7-48"><a href="#cb7-48" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-49"><a href="#cb7-49" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Queue</span> a <span class="ot">=</span> <span class="dt">Queue</span> {</span>
<span id="cb7-50"><a href="#cb7-50" aria-hidden="true" tabindex="-1"></a><span class="ot">    resumeKont ::</span> <span class="op">!</span>(<span class="dt">Kont</span> a ()),</span>
<span id="cb7-51"><a href="#cb7-51" aria-hidden="true" tabindex="-1"></a><span class="ot">    front ::</span> [a],</span>
<span id="cb7-52"><a href="#cb7-52" aria-hidden="true" tabindex="-1"></a><span class="ot">    back ::</span> [a],</span>
<span id="cb7-53"><a href="#cb7-53" aria-hidden="true" tabindex="-1"></a><span class="ot">    frontCount ::</span> <span class="op">!</span><span class="dt">Int</span>,</span>
<span id="cb7-54"><a href="#cb7-54" aria-hidden="true" tabindex="-1"></a><span class="ot">    backCount ::</span> <span class="op">!</span><span class="dt">Int</span></span>
<span id="cb7-55"><a href="#cb7-55" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb7-56"><a href="#cb7-56" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-57"><a href="#cb7-57" aria-hidden="true" tabindex="-1"></a><span class="ot">empty ::</span> <span class="dt">Queue</span> a</span>
<span id="cb7-58"><a href="#cb7-58" aria-hidden="true" tabindex="-1"></a>empty <span class="ot">=</span> <span class="dt">Queue</span> { resumeKont <span class="ot">=</span> <span class="dt">IDLE</span>, front <span class="ot">=</span> [], back <span class="ot">=</span> [], frontCount <span class="ot">=</span> <span class="dv">0</span>, backCount <span class="ot">=</span> <span class="dv">0</span> }</span>
<span id="cb7-59"><a href="#cb7-59" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-60"><a href="#cb7-60" aria-hidden="true" tabindex="-1"></a><span class="ot">dequeue ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> (<span class="dt">Maybe</span> a, <span class="dt">Queue</span> a)</span>
<span id="cb7-61"><a href="#cb7-61" aria-hidden="true" tabindex="-1"></a>dequeue q <span class="ot">=</span></span>
<span id="cb7-62"><a href="#cb7-62" aria-hidden="true" tabindex="-1"></a>    <span class="kw">case</span> front <span class="kw">of</span></span>
<span id="cb7-63"><a href="#cb7-63" aria-hidden="true" tabindex="-1"></a>        [] <span class="ot">-&gt;</span> (<span class="dt">Nothing</span>, q)</span>
<span id="cb7-64"><a href="#cb7-64" aria-hidden="true" tabindex="-1"></a>        (x<span class="op">:</span>front&#39;) <span class="ot">-&gt;</span></span>
<span id="cb7-65"><a href="#cb7-65" aria-hidden="true" tabindex="-1"></a>            (<span class="dt">Just</span> x, q&#39; { front <span class="ot">=</span> front&#39;, frontCount <span class="ot">=</span> frontCount <span class="op">-</span> <span class="dv">1</span> })</span>
<span id="cb7-66"><a href="#cb7-66" aria-hidden="true" tabindex="-1"></a>  <span class="kw">where</span> q&#39;<span class="op">@</span>(<span class="dt">Queue</span> { <span class="op">..</span> }) <span class="ot">=</span> resume q</span>
<span id="cb7-67"><a href="#cb7-67" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-68"><a href="#cb7-68" aria-hidden="true" tabindex="-1"></a><span class="ot">enqueue ::</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb7-69"><a href="#cb7-69" aria-hidden="true" tabindex="-1"></a>enqueue x q<span class="op">@</span>(<span class="dt">Queue</span> { <span class="op">..</span> }) <span class="ot">=</span> resume (q { back <span class="ot">=</span> x<span class="op">:</span>back, backCount <span class="ot">=</span> backCount <span class="op">+</span> <span class="dv">1</span> })</span></code></pre></div>
<h3 id="hood-melville-implementation">Hood-Melville Implementation</h3>
<p>This is just the Haskell code from <em>Purely Functional Data Structures</em> adapted
to the interface of the other examples.</p>
<p>This code is mostly to compare. The biggest difference, other than some code
structuring differences, is the front and back lists are reversed in parallel
while my code does them sequentially. As mentioned before, to get a structure
like that would simply be a matter of defining a parallel incremental reverse
back in the <a href="#global-rebuilding-implementation">Global Rebuilding Implementation</a>.</p>
<p>Again, Okasaki’s real-time queue that can be seen as an application of the
<em>lazy</em> rebuilding and scheduling techniques, described in his thesis and book,
is a better implementation than this in pretty much every way.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">HoodMelvilleQueue</span> (<span class="dt">Queue</span>, empty, dequeue, enqueue) <span class="kw">where</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">RotationState</span> a</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>  <span class="ot">=</span> <span class="dt">Idle</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>  <span class="op">|</span> <span class="dt">Reversing</span> <span class="op">!</span><span class="dt">Int</span> [a] [a] [a] [a]</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>  <span class="op">|</span> <span class="dt">Appending</span> <span class="op">!</span><span class="dt">Int</span> [a] [a]</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a>  <span class="op">|</span> <span class="dt">Done</span> [a]</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Queue</span> a <span class="ot">=</span> <span class="dt">Queue</span> <span class="op">!</span><span class="dt">Int</span> [a] (<span class="dt">RotationState</span> a) <span class="op">!</span><span class="dt">Int</span> [a]</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a><span class="ot">exec ::</span> <span class="dt">RotationState</span> a <span class="ot">-&gt;</span> <span class="dt">RotationState</span> a</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a>exec (<span class="dt">Reversing</span> ok (x<span class="op">:</span>f) f&#39; (y<span class="op">:</span>r) r&#39;) <span class="ot">=</span> <span class="dt">Reversing</span> (ok<span class="op">+</span><span class="dv">1</span>) f (x<span class="op">:</span>f&#39;) r (y<span class="op">:</span>r&#39;)</span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a>exec (<span class="dt">Reversing</span> ok [] f&#39; [y] r&#39;) <span class="ot">=</span> <span class="dt">Appending</span> ok f&#39; (y<span class="op">:</span>r&#39;)</span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a>exec (<span class="dt">Appending</span> <span class="dv">0</span> f&#39; r&#39;) <span class="ot">=</span> <span class="dt">Done</span> r&#39;</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a>exec (<span class="dt">Appending</span> ok (x<span class="op">:</span>f&#39;) r&#39;) <span class="ot">=</span> <span class="dt">Appending</span> (ok<span class="op">-</span><span class="dv">1</span>) f&#39; (x<span class="op">:</span>r&#39;)</span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a>exec state <span class="ot">=</span> state</span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-18"><a href="#cb8-18" aria-hidden="true" tabindex="-1"></a><span class="ot">invalidate ::</span> <span class="dt">RotationState</span> a <span class="ot">-&gt;</span> <span class="dt">RotationState</span> a</span>
<span id="cb8-19"><a href="#cb8-19" aria-hidden="true" tabindex="-1"></a>invalidate (<span class="dt">Reversing</span> ok f f&#39; r r&#39;) <span class="ot">=</span> <span class="dt">Reversing</span> (ok<span class="op">-</span><span class="dv">1</span>) f f&#39; r r&#39;</span>
<span id="cb8-20"><a href="#cb8-20" aria-hidden="true" tabindex="-1"></a>invalidate (<span class="dt">Appending</span> <span class="dv">0</span> f&#39; (x<span class="op">:</span>r&#39;)) <span class="ot">=</span> <span class="dt">Done</span> r&#39;</span>
<span id="cb8-21"><a href="#cb8-21" aria-hidden="true" tabindex="-1"></a>invalidate (<span class="dt">Appending</span> ok f&#39; r&#39;) <span class="ot">=</span> <span class="dt">Appending</span> (ok<span class="op">-</span><span class="dv">1</span>) f&#39; r&#39;</span>
<span id="cb8-22"><a href="#cb8-22" aria-hidden="true" tabindex="-1"></a>invalidate state <span class="ot">=</span> state</span>
<span id="cb8-23"><a href="#cb8-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-24"><a href="#cb8-24" aria-hidden="true" tabindex="-1"></a><span class="ot">exec2 ::</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">RotationState</span> a <span class="ot">-&gt;</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb8-25"><a href="#cb8-25" aria-hidden="true" tabindex="-1"></a>exec2 <span class="op">!</span>lenf f state lenr r <span class="ot">=</span></span>
<span id="cb8-26"><a href="#cb8-26" aria-hidden="true" tabindex="-1"></a>    <span class="kw">case</span> exec (exec state) <span class="kw">of</span></span>
<span id="cb8-27"><a href="#cb8-27" aria-hidden="true" tabindex="-1"></a>        <span class="dt">Done</span> newf <span class="ot">-&gt;</span> <span class="dt">Queue</span> lenf newf <span class="dt">Idle</span> lenr r</span>
<span id="cb8-28"><a href="#cb8-28" aria-hidden="true" tabindex="-1"></a>        newstate <span class="ot">-&gt;</span> <span class="dt">Queue</span> lenf f newstate lenr r</span>
<span id="cb8-29"><a href="#cb8-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-30"><a href="#cb8-30" aria-hidden="true" tabindex="-1"></a><span class="ot">check ::</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">RotationState</span> a <span class="ot">-&gt;</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb8-31"><a href="#cb8-31" aria-hidden="true" tabindex="-1"></a>check <span class="op">!</span>lenf f state <span class="op">!</span>lenr r <span class="ot">=</span></span>
<span id="cb8-32"><a href="#cb8-32" aria-hidden="true" tabindex="-1"></a>    <span class="kw">if</span> lenr <span class="op">&lt;=</span> lenf <span class="kw">then</span> exec2 lenf f state lenr r</span>
<span id="cb8-33"><a href="#cb8-33" aria-hidden="true" tabindex="-1"></a>    <span class="kw">else</span> <span class="kw">let</span> newstate <span class="ot">=</span> <span class="dt">Reversing</span> <span class="dv">0</span> f [] r []</span>
<span id="cb8-34"><a href="#cb8-34" aria-hidden="true" tabindex="-1"></a>         <span class="kw">in</span> exec2 (lenf<span class="op">+</span>lenr) f newstate <span class="dv">0</span> []</span>
<span id="cb8-35"><a href="#cb8-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-36"><a href="#cb8-36" aria-hidden="true" tabindex="-1"></a><span class="ot">empty ::</span> <span class="dt">Queue</span> a</span>
<span id="cb8-37"><a href="#cb8-37" aria-hidden="true" tabindex="-1"></a>empty <span class="ot">=</span> <span class="dt">Queue</span> <span class="dv">0</span> [] <span class="dt">Idle</span> <span class="dv">0</span> []</span>
<span id="cb8-38"><a href="#cb8-38" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-39"><a href="#cb8-39" aria-hidden="true" tabindex="-1"></a><span class="ot">dequeue ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> (<span class="dt">Maybe</span> a, <span class="dt">Queue</span> a)</span>
<span id="cb8-40"><a href="#cb8-40" aria-hidden="true" tabindex="-1"></a>dequeue q<span class="op">@</span>(<span class="dt">Queue</span> _ [] _ _ _) <span class="ot">=</span> (<span class="dt">Nothing</span>, q)</span>
<span id="cb8-41"><a href="#cb8-41" aria-hidden="true" tabindex="-1"></a>dequeue (<span class="dt">Queue</span> lenf (x<span class="op">:</span>f&#39;) state lenr r) <span class="ot">=</span></span>
<span id="cb8-42"><a href="#cb8-42" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> <span class="op">!</span>q&#39; <span class="ot">=</span> check (lenf<span class="op">-</span><span class="dv">1</span>) f&#39; (invalidate state) lenr r <span class="kw">in</span></span>
<span id="cb8-43"><a href="#cb8-43" aria-hidden="true" tabindex="-1"></a>    (<span class="dt">Just</span> x, q&#39;)</span>
<span id="cb8-44"><a href="#cb8-44" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-45"><a href="#cb8-45" aria-hidden="true" tabindex="-1"></a><span class="ot">enqueue ::</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb8-46"><a href="#cb8-46" aria-hidden="true" tabindex="-1"></a>enqueue x (<span class="dt">Queue</span> lenf f state lenr r) <span class="ot">=</span> check lenf f state (lenr<span class="op">+</span><span class="dv">1</span>) (x<span class="op">:</span>r)</span></code></pre></div>
<h3 id="okasakis-real-time-queues">Okasaki’s Real-Time Queues</h3>
<p>Just for completeness. This implementation <em>crucially</em> relies on lazy evaluation. Our queues are of
the form <code>Queue f r s</code>. If you look carefully, you’ll notice that the only place we consume <code>s</code> is
in the first clause of <code>exec</code>, and there we discard its elements. In other words, we only care about
the length of <code>s</code>. <code>s</code> gets “decremented” each time we <code>enqueue</code> until it’s empty at which point we
rotate <code>r</code> to <code>f</code> in the second clause of <code>exec</code>. The key thing is that <code>f</code> and <code>s</code> are initialized
to the same value in that clause. That means each time we “decrement” <code>s</code> we are also forcing a bit
of <code>f</code>. Forcing a bit of <code>f</code>/<code>s</code> means computing a bit of <code>rotate</code>. <code>rotate xs ys a</code> is an
incremental version of <code>xs ++ reverse ys ++ a</code> (where we use the invariant
<code>length ys = 1 + length xs</code> for the base case).</p>
<p>Using Okasaki’s terminology, <code>rotate</code> illustrates a simple form of <strong>lazy rebuilding</strong> where we use
lazy evaluation rather than explicit or implicit coroutines to perform work “in parallel”. Here, we
interleave the evaluation of <code>rotate</code> with <code>enqueue</code> and <code>dequeue</code> via forcing the conses of
<code>f</code>/<code>s</code>. However, lazy rebuilding itself may not lead to worst-case optimal times (assuming it is
amortized optimal). We need to use Okasaki’s other technique of <strong>scheduling</strong> to strategically
force the thunks incrementally rather than all at once. Here <code>s</code> is a schedule telling us when to
force parts of <code>f</code>. (As mentioned, <code>s</code> also serves as a counter telling us when to perform a
rebuild.)</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">OkasakiQueue</span> ( <span class="dt">Queue</span>, empty, dequeue, enqueue ) <span class="kw">where</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Queue</span> a <span class="ot">=</span> <span class="dt">Queue</span> [a] <span class="op">!</span>[a] [a]</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="ot">empty ::</span> <span class="dt">Queue</span> a</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a>empty <span class="ot">=</span> <span class="dt">Queue</span> [] [] []</span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a><span class="ot">dequeue ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> (<span class="dt">Maybe</span> a, <span class="dt">Queue</span> a)</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a>dequeue q<span class="op">@</span>(<span class="dt">Queue</span> [] _ _) <span class="ot">=</span> (<span class="dt">Nothing</span>, q)</span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a>dequeue (<span class="dt">Queue</span> (x<span class="op">:</span>f) r s) <span class="ot">=</span> (<span class="dt">Just</span> x, exec f r s)</span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a><span class="ot">rotate ::</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> [a]</span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a>rotate     [] (y<span class="op">:</span> _) a <span class="ot">=</span> y<span class="op">:</span>a</span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a>rotate (x<span class="op">:</span>xs) (y<span class="op">:</span>ys) a <span class="ot">=</span> x<span class="op">:</span>rotate xs ys (y<span class="op">:</span>a)</span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a><span class="ot">exec ::</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a>exec f <span class="op">!</span>r (_<span class="op">:</span>s) <span class="ot">=</span> <span class="dt">Queue</span> f r s</span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a>exec f <span class="op">!</span>r [] <span class="ot">=</span> <span class="kw">let</span> f&#39; <span class="ot">=</span> rotate f r [] <span class="kw">in</span> <span class="dt">Queue</span> f&#39; [] f&#39;</span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a><span class="ot">enqueue ::</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a>enqueue x (<span class="dt">Queue</span> f r s) <span class="ot">=</span> exec f (x<span class="op">:</span>r) s </span></code></pre></div>
<p>It’s instructive to compare the above to the following implementation which doesn’t use a schedule.
This implementation is essentially the Banker’s Queue from Okasaki’s book, except we use lazy
rebuilding to spread the <code>xs ++ reverse ys</code> (particularly the <code>reverse</code> part) over multiple
<code>dequeue</code>s via <code>rotate</code>. The following implementation performs extremely well in my benchmark, but
the operations are subtly not constant-time. Specifically, after a long series of <code>enqueue</code>s, a
<code>dequeue</code> will do work proportional to the logarithm of the number of <code>enqueue</code>s. Essentially, <code>f</code>
will be a nested series of <code>rotate</code> calls, one for every doubling of the length of the queue. Even
if we change <code>let f'</code> to <code>let !f'</code>, that will only make the first <code>dequeue</code> cheap. The second will
still be expensive.</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">UnscheduledOkasakiQueue</span> ( <span class="dt">Queue</span>, empty, dequeue, enqueue ) <span class="kw">where</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Queue</span> a <span class="ot">=</span> <span class="dt">Queue</span> [a] <span class="op">!</span><span class="dt">Int</span> [a] <span class="op">!</span><span class="dt">Int</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a><span class="ot">empty ::</span> <span class="dt">Queue</span> a</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a>empty <span class="ot">=</span> <span class="dt">Queue</span> [] <span class="dv">0</span> [] <span class="dv">0</span></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a><span class="ot">dequeue ::</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> (<span class="dt">Maybe</span> a, <span class="dt">Queue</span> a)</span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a>dequeue q<span class="op">@</span>(<span class="dt">Queue</span> [] _ _ _) <span class="ot">=</span> (<span class="dt">Nothing</span>, q)</span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a>dequeue (<span class="dt">Queue</span> (x<span class="op">:</span>f) lenf r lenr) <span class="ot">=</span> (<span class="dt">Just</span> x, exec f (lenf <span class="op">-</span> <span class="dv">1</span>) r lenr)</span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a><span class="ot">rotate ::</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> [a]</span>
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a>rotate     [] (y<span class="op">:</span> _) a <span class="ot">=</span> y<span class="op">:</span>a</span>
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true" tabindex="-1"></a>rotate (x<span class="op">:</span>xs) (y<span class="op">:</span>ys) a <span class="ot">=</span> x<span class="op">:</span>rotate xs ys (y<span class="op">:</span>a)</span>
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-16"><a href="#cb10-16" aria-hidden="true" tabindex="-1"></a><span class="ot">exec ::</span> [a] <span class="ot">-&gt;</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb10-17"><a href="#cb10-17" aria-hidden="true" tabindex="-1"></a>exec f <span class="op">!</span>lenf <span class="op">!</span>r <span class="op">!</span>lenr <span class="op">|</span> lenf <span class="op">&gt;=</span> lenr <span class="ot">=</span> <span class="dt">Queue</span> f lenf r lenr</span>
<span id="cb10-18"><a href="#cb10-18" aria-hidden="true" tabindex="-1"></a>exec f <span class="op">!</span>lenf <span class="op">!</span>r <span class="op">!</span>lenr <span class="ot">=</span> <span class="kw">let</span> f&#39; <span class="ot">=</span> rotate f r [] <span class="kw">in</span> <span class="dt">Queue</span> f&#39; (lenf <span class="op">+</span> lenr) [] <span class="dv">0</span></span>
<span id="cb10-19"><a href="#cb10-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-20"><a href="#cb10-20" aria-hidden="true" tabindex="-1"></a><span class="ot">enqueue ::</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a <span class="ot">-&gt;</span> <span class="dt">Queue</span> a</span>
<span id="cb10-21"><a href="#cb10-21" aria-hidden="true" tabindex="-1"></a>enqueue x (<span class="dt">Queue</span> f lenf r lenr) <span class="ot">=</span> exec f lenf (x<span class="op">:</span>r) (lenr <span class="op">+</span> <span class="dv">1</span>) </span></code></pre></div>
<h3 id="empirical-evaluation">Empirical Evaluation</h3>
<p>I won’t reproduce the evaluation code as it’s not very sophisticated or interesting.
It randomly generated a sequence of enqueues and dequeues with an 80% chance to produce
an enqueue over a dequeue so that the queues would grow. It measured the average
time of an enqueue and a dequeue, as well as the maximum time of any single dequeue.</p>
<p>The main thing I wanted to see was relatively stable average enqueue and dequeue
times with only the batched implementation having a growing maximum dequeue time.
This is indeed what I saw, though it took about 1,000,000 operations (or really
a queue of a couple hundred thousand elements) for the numbers to stabilize.</p>
<p>The results were mostly unsurprising. Unsurprisingly, in overall time, the batched
implementation won. Its <code>enqueue</code> is also, obviously, the fastest. (Indeed, there’s
a good chance my measurement of its average enqueue time was largely a measurement
of the timer’s resolution.) The operations’ average times were stable illustrating their
constant (amortized) time. At large enough sizes, the ratio of the maximum dequeue
time versus the average stabilized around 7000 to 1, except, of course, for the
batched version which grew linearly to millions to 1 ratios at queue sizes of tens
of millions of elements. This illustrates the worst-case time complexity of all the
other implementations, and the merely amortized time complexity of the batched one.</p>
<p>While the batched version was best in overall time, the difference wasn’t that great.
The worst implementations were still less 1.4x slower. All the worst-case optimal
implementations performed roughly the same, but there were still some clear winners
and losers. Okasaki’s real-time queue is almost on-par with the batched
implementation in overall time and handily beats the other implementations in average
enqueue and dequeue times. The main surprise for me was that the loser was the
Hood-Melville queue. My guess is this is due to <code>invalidate</code> which seems like it
would do more work and produce more garbage than the approach taken in my functional
version.</p>
<h2 id="conclusion">Conclusion</h2>
<p>The point of this article was to illustrate the process of deriving a deamortized
data structure from an amortized one utilizing batched rebuilding by explicitly
modeling global rebuilding as a coroutine.</p>
<p>The point wasn’t to produce the fastest queue implementation, though I am pretty
happy with the results. While this is an extremely simple example, it was still
nice that each step was very easy and natural. It’s especially nice that this
derivation approach produced a better result than the Hood-Melville queue.</p>
<p>Of course, my advice is to use Okasaki’s real-time queue if you need a purely functional queue
with worst-case constant-time operations.</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>This code could definitely be refactored to leverage
this similarity to reduce code. Alternatively, one could refunctionalize
the <a href="#hood-melville-implementation">Hood-Melville implementation</a> at the end.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>Going “too fast”, so long as it’s still a constant amount of
work for each step, isn’t really an issue asymptotically, so you can just
crank the knobs if you don’t want to think too hard about it. That said,
going faster than you need to will likely give you worse worst-case
constant factors. In some cases, going faster than necessary could reduce
constant factors, e.g. by better utilizing caches and disk I/O buffers.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>2024-10-04 01:24:57-07:00</pubDate>
    <guid>https://www.hedonisticlearning.com/posts/global-rebuilding-coroutines-and-defunctionalization.html</guid>
    <dc:creator>Derek Elkins</dc:creator>
</item>
<item>
    <title>Morleyization</title>
    <link>https://www.hedonisticlearning.com/posts/morleyization.html</link>
    <description><![CDATA[<h2 id="introduction">Introduction</h2>
<p>Morleyization is a fairly important operation in categorical logic for which it is hard to find readily
accessible references to a statement and proof. Most refer to D1.5.13 of “Sketches of an Elephant” which is
not an accessible text. 3.2.8 of “Accessible Categories” by Makkai and Paré is another reference, and
“Accessible Categories” is more accessible but still a big ask for just a single theorem.</p>
<p>Here I reproduce the statement and proof from “Accessible Categories” albeit with some notational and
conceptual adaptations as well as some commentary. This assumes some basic familiarity with the ideas
and notions of traditional model theory, e.g. what structures, models, and |\vDash| are.</p>
<!--more-->
<h2 id="preliminaries">Preliminaries</h2>
<p>The context of the theorem is <a href="https://en.wikipedia.org/wiki/Infinitary_logic">infinitary, classical (multi-sorted) first-order logic</a>.
|L| will stand for a <strong>language</strong> aka a <strong>signature</strong>, i.e. sorts, function symbols, predicate symbols as usual,
except if we’re allowing infinitary quantification we may have function or predicate symbols of infinite
arity. We write |L_{\kappa,\lambda}| for the corresponding classical first-order logic where we
allow conjunctions and disjunctions indexed by sets of cardinality less than the regular (infinite) cardinal
|\kappa| while allowing quantification over sets of variables of (infinite) cardinality less than
|\lambda \leq \kappa|. |\lambda=\varnothing| is also allowed to indicate a propositional logic.
If |\kappa| or |\lambda| are |\infty|, that means conjunctions/disjunctions or quantifications over
arbitrary sets. |L_{\omega,\omega}| would be normal finitary, classical first-order logic. Geometric
logic would be a fragment of |L_{\infty,\omega}|. The theorem will focus on |L_{\infty,\infty}|,
but inspection of the proof shows that theorem would hold for any reasonable choice for |\kappa|
and |\lambda|.</p>
<p>As a note, infinitary logics can easily have a <em>proper class</em> of formulas. Thus, it will make sense
to talk about <em>small</em> subclasses of formulas, i.e. ones which are sets.</p>
<p>Instead of considering logics with different sets of connectives Makkai and Paré, introduces the
fairly standard notion of a <strong>positive existential</strong> formula which is a formula that uses only
atomic formulas, conjunctions, disjunctions, and existential quantification. That is, no implication,
negation, or universal quantification. They then define a <strong>basic sentence</strong> as “a conjunction of
a set of sentences, i.e. closed formulas, each of which is of the form |\forall\vec x(\phi\to\psi)|
where |\phi| and |\psi| are [positive existential] formulas”.</p>
<p>It’s clear the component formulas of a basic sentences correspond to sequents of the form
|\phi\vdash\psi| for open positive existential formulas. A basic sentence corresponds to what
is often called a theory, i.e. a set of sequents. Infinitary logic lets us smash a theory down
to a single formula, but I think the theory concept is clearer though I’m sure there are benefits
to having a single formula. Instead of talking about basic sentences, we can talk about a theory
in the positive existential fragment of the relevant logic. This has the benefit that we don’t
need to introduce connectives or infinitary versions of connectives just for structural reasons.
I’ll call a theory that corresponds to a basic sentence a <strong>positive existential theory</strong> for
conciseness.</p>
<p>Makkai and Paré also define |L_{\kappa,\lambda}^*| “for the class of formulas |L_{\kappa,\lambda}|
which are conjunctions of formulas in each of which the only conjunctions occurring are
of cardinality |&lt; \lambda|”. For us, the main significance of this is that geometric theories
correspond to basic sentences in |L_{\infty,\omega}^*| as this limits the conjunctions to the
finitary case. Indeed, Makkai and Paré include the somewhat awkward sentence: “Thus, a <em>geometric
theory</em> is the same as a basic sentence in |L_{\infty,\omega}^*|, and a <em>coherent theory</em> is
a conjunction of basic sentences in |L_{\omega,\omega}|.” Presumably, the ambiguous meaning of
“conjunction” leads to the differences in how these are stated, i.e. a basic sentence is already
a “conjunction” of formulas.</p>
<p>The standard notion of an |L|-structure and model are used, and I won’t give a precise definition
here. An <strong>|L|-structure</strong> assigns meaning (sets, functions, and relations) to all the sorts and
symbols of |L|, and a model of a formula (or theory) is an |L|-structure which satisfies the
formula (or all the formulas of the theory). We’ll write |Str(L)| for the category of |L|-structures
and homomorphisms. In categorical logic, an |L|-structure would usually
be some kind of structure preserving (fibred) functor usually into |\mathbf{Set}|, and a homomorphism
is a natural transformation. A formula would be mapped to a subobject, and a model would require
these subobjects to satisfy certain factoring properties specified by the theory. A sequent
|\varphi \vdash \psi| in the theory would require a model to have the interpretation of
|\varphi| factor through the interpretation of |\psi|, i.e. for the former to be a subset
of the latter when interpreting into |\mathbf{Set}|.</p>
<h2 id="theorem-statement">Theorem Statement</h2>
<p>|\mathcal F \subseteq L_{\infty,\infty}| is called a <strong>fragment of |L_{\infty,\infty}|</strong> if:</p>
<ol type="i">
<li>it contains all atomic formulas of |L|,</li>
<li>it is closed under substitution,</li>
<li>if a formula is in |\mathcal F| then so are all its subformulas,</li>
<li>if |\forall\vec x\varphi \in \mathcal F|, then so is |\neg\exists\vec x\neg\varphi|, and</li>
<li>if |\varphi\to\psi \in \mathcal F|, then so is |\neg\varphi\lor\psi|.</li>
</ol>
<p>Basically, and the motivation for this will become clear shortly, formulas in |\mathcal F| are
like “compound atomic formulas” with the caveat that we must include the classically equivalent
versions of |\forall| and |\to| in terms of |\neg| and |\exists| or |\lor| respectively.</p>
<p>Given |\mathcal F|, we define an <strong>|\mathcal F|-basic sentence</strong> exactly like a basic sentence
except that we allow formulas from |\mathcal F| instead of just atomic formulas as the base
case. In theory language, an |\mathcal F|-basic sentence is a theory, i.e. set of sequents,
using only the connectives |\bigwedge|, |\bigvee|, and |\exists|, except within subformulas
contained in |\mathcal F| which may use any (first-order) connective. We’ll call such a
theory a <strong>positive existential |\mathcal F|-theory</strong>. Much of the following will be double-barrelled
as I try to capture the proof as stated in “Accessible Categories” and my slight reformulation
using positive existential theories.</p>
<p>|\mathrm{Mod}^{(\mathcal F)}(\mathbb T)| for a theory |\mathbb T| (or
|\mathrm{Mod}^{(\mathcal F)}(\sigma)| for a basic sentence |\sigma|) is the category
whose objects are |L|-structures that are models of |\mathbb T| (or |\sigma|), and whose arrows are the
|\mathcal F|-elementary mappings. An <strong>|\mathcal F|-elementary mapping</strong> |h : M \to N|,
for <em>any</em> subset of formulas of |L_{\infty,\infty}|, |\mathcal F|, is a mapping of |L|-structures
which preserves the meaning of all formulas in |\mathcal F|.
That is, |M \vDash \varphi(\vec a)| implies |N \vDash \varphi(h(\vec a))| for all
formulas, |\varphi \in \mathcal F| and appropriate sequences |\vec a|. We can define
the <strong>elementary mappings for a language |L’|</strong> as the |\mathcal F’|-elementary mappings
where |\mathcal F’| consists of (only) the atomic formulas of |L’|. |\mathrm{Mod}^{(L’)}(\mathbb T’)|
(or |\mathrm{Mod}^{(L’)}(\sigma’)|) can be defined by |\mathrm{Mod}^{(\mathcal F’)}(\mathbb T’)|
(or |\mathrm{Mod}^{(L’)}(\sigma’)|) for the |\mathcal F’| determined this way.</p>
<p>Here’s the theorem as stated in “Accessible Categories”.</p>
<blockquote>
<p><strong>Theorem</strong> (Proposition 3.2.8): Given any small fragment |\mathcal F| and an |\mathcal F|-basic
sentence |\sigma|, the category of |\mathrm{Mod}^{(\mathcal F)}(\sigma)| is equivalent to
|\mathrm{Mod}^{(L’)}(\sigma’)| for some other language |L’| and basic sentence |\sigma’| over
|L’|, hence by 3.2.1, to the category of models of a small sketch as well.</p>
</blockquote>
<p>We’ll replace the |\mathcal F|-basic sentences |\sigma| and |\sigma’| with positive existential
|\mathcal F|-theories |\mathbb T| and |\mathbb T’|.</p>
<p>Implied is that |\mathcal F \subseteq L_{\infty,\infty}|, i.e. that |L| and |L’| may be distinct
and usually will be. As the proof will show, they agree on sorts and function symbols, but we have
different predicate symbols in |L’|.</p>
<p>I’ll be ignoring the final comment referencing Theorem 3.2.1. Theorem 3.2.1 is the main theorem of
the section and states that every small sketch gives rise to a language |L| and theory |\mathbb T|
(or basic sentence |\sigma|) and vice versa such that the category of models of the sketch are
equivalent to models of |\mathbb T| (or |\sigma|). Thus, the final comment is an immediate corollary.</p>
<p>For us, the interesting part of 3.2.8 is that it takes a classical first-order theory, |\mathbb T|,
and produces a positive existential theory, as represented by |\mathbb T’|, that has
an equivalent, in fact isomorphic, category of models. This positive existential theory is called
the <strong>Morleyization</strong> of the first-order theory.</p>
<p>In particular, if we have a <em>finitary</em> classical first-order theory, then we get a coherent theory
with the same models. This means to study models of classical first-order theories, it’s enough
to study models of coherent theories via the Morleyization of the classical first-order theories.
This allows many techniques for geometric and coherent theories to be applied, e.g. (pre)topos theory
and classifying toposes. As stated before, the theorem statement doesn’t actually make it clear that
the result holds for a restricted degree of “infinitariness”, but this is obvious from the proof.</p>
<h2 id="proof">Proof</h2>
<p>I’ll quote the first few sentences of the proof to which I have nothing to add.</p>
<blockquote>
<p>The idea is to replace each formula in |\mathcal F| by a new predicate. Let the
sorts of the language |L’| be the same as those of |L|, and similarly for the [function]
symbols.</p>
</blockquote>
<p>The description of the predicate symbols is complicated by their (potential) infinitary nature.
I’ll quote the proof here as well as I have nothing to add and am not as interested in this case.
The finitary quantifiers case would be similar, just slightly less technical. It would be even
simpler if we defined formulas in a given (ordered) variable context as is typical in categorical
logic.</p>
<blockquote>
<p>With any formula |\phi(\vec x)| in |\mathcal F|, with |\vec x| the repetition free sequence
|\langle x_\beta\rangle_{\beta&lt;\alpha}| of exactly the free variables of |\phi| in a
once and for all fixed order of variables, let us associate the new [predicate] symbol |P_\phi|
of arity |a : \alpha \to \mathrm{Sorts}| such that |a(\beta) = x_\beta|. The [predicate]
symbols of |L’| are the |P_\phi| for all |\phi\in\mathcal F|.</p>
</blockquote>
<p>The motivation of |\mathcal F|-basic sentences / positive existential |\mathcal F|-theories should
now be totally clear. The |\mathcal F|-basic sentences / positive existential |\mathcal F|-theories
are literally basic sentences / positive existential theories in the language of |L’| if we
replace all occurrences of subformulas in |\mathcal F| with their corresponding predicate symbol in |L’|.</p>
<p>We can extend any |L|-structure |M| to an |L’|-structure |M^\sharp| such that they agree on all the sorts
and function symbols of |L|, and |M^\sharp| satisfies |M^\sharp \vDash P_\varphi(\vec a)| if and only if
|M \vDash \varphi(\vec a)|. Which is to say, we <em>define</em> the interpretation of |P_\varphi| to
be the subset of the interpretation of its domain specified by |M \vDash \varphi(\vec a)| for all
|\vec a| in the domain. In more categorical language, we define the subobject that |P_\varphi|
gets sent to to be the subobject |\varphi|.</p>
<p>We can define an |L|-structure, |N^\flat|, for |N| an |L’|-structure by, again, requiring it to do the
same thing to sorts and function symbols as |N|, and defining the interpretation of the predicate
symbols as |N^\flat \vDash R(\vec a)| if and only if |N \vDash P_{R(\vec x)}(\vec a)|.</p>
<p>We immediately have |(M^\sharp)^\flat = M|.</p>
<p>We can extend this to |L’|-formulas. Let |\psi| be an |L’|-formula, then |\psi^\flat| is defined
by a connective-preserving operation for which we only need to specify the action on predicate
symbols. We define that by declaring |P_\varphi(\vec t)^\flat| gets mapped to |\varphi(\vec t)|.
We extend |\flat| to theories via |\mathbb T’^\flat \equiv \{ \varphi^\flat \vdash \psi^\flat \mid (\varphi\vdash\psi) \in \mathbb T’\}|.
A similar induction allows us to prove \[M\vDash\psi^\flat(\vec a)\iff M^\sharp\vDash\psi(\vec a)\]
for all |L|-structures |M| and appropriate |\vec a|.</p>
<p>We have |\mathbb T = \mathbb T’^\flat| for a positive existential theory |\mathbb T’| over |L’|
(or |\sigma = \rho^\flat| for a basic |L’|-sentence |\rho|)
and thus |\varphi^\flat \vDash_M \psi^\flat \iff \varphi \vDash_{M^\sharp}\psi|
for all |\varphi\vdash\psi \in \mathbb T’| (or |M \vDash\sigma \iff M^\sharp\vDash\rho|).
We want to make it so that any |L’|-structure |N| interpreting |\mathbb T’| (or |\rho|) as |\mathbb T|
(or |\sigma|) is of the form |N = M^\sharp| for some |M|. Right now that doesn’t happen because, while
the definition of |M^\sharp| forces it to respect the logical connectives in the formula |\varphi|
associated to the |L’| predicate symbol |P_\varphi|, this isn’t required for an arbitrary model |N|.
For example, nothing requires |N \vDash P_\top| to hold.</p>
<p>The solution is straightforward. In addition to |\mathbb T’| (or |\rho|) representing
the theory |\mathbb T| (or |\sigma|), we add in an additional set of axioms |\Phi|
that capture the behavior of the (encoded) logical connectives of the formulas associated to the
predicate symbols.</p>
<p>These axioms are largely structural with a few exceptions that I’ll address separately. I’ll present
this as a collection of sequents for a theory, but we can replace |\vdash| and |\dashv \vdash| with
|\to| and |\leftrightarrow| for the basic sentence version. |\varphi \dashv\vdash \psi| stands
for two sequents going opposite directions.</p>
<p>\[\begin{align}
\varphi(\vec x) &amp; \dashv\vdash P_\varphi(\vec x) \tag{for atomic $\varphi$} \\
P_{R(\vec x)}(\vec t) &amp; \dashv\vdash P_{R(\vec t)}(\vec y) \tag{for terms $\vec t$ with free variables $\vec y$} \\
P_{\bigwedge\Sigma}(\vec x) &amp; \dashv\vdash \bigwedge_{\varphi \in \Sigma} P_\varphi(\vec x_\varphi) \tag{$\vec x_\varphi$ are the free variables of $\varphi$} \\
P_{\bigvee\Sigma}(\vec x) &amp; \dashv\vdash \bigvee_{\varphi \in \Sigma} P_\varphi(\vec x_\varphi) \tag{$\vec x_\varphi$ are the free variables of $\varphi$} \\
P_{\exists\vec y.\varphi(\vec x,\vec y)}(\vec x) &amp; \dashv\vdash \exists\vec y.P_{\varphi(\vec x,\vec y)}(\vec x,\vec y)
\end{align}\]</p>
<p>We then have two axiom schemas that eliminate the |\forall| and |\to| by leveraging the defining
property of |\mathcal F| being a fragment.</p>
<p>\[\begin{align}
P_{\forall\vec y.\varphi(\vec x,\vec y)}(\vec x) &amp; \dashv\vdash P_{\neg\exists\vec y.\neg\varphi(\vec x,\vec y)}(\vec x) \\
P_{\varphi\to\psi}(\vec x) &amp; \dashv\vdash P_{\neg\varphi}(\vec x) \lor P_\psi(\vec x)
\end{align}\]</p>
<p>We avoid needing negation by axiomatizing that |P_{\neg\varphi}| is the complement to |P_\varphi|. <strong>This
is arguably the key idea.</strong> Once we can simulate the behavior of negation without actually needing it, then
it is clear that we can embed all the other non-positive-existential connectives.</p>
<p>\[\begin{align}
&amp; \vdash P_{\neg\varphi}(\vec x) \lor P_\varphi(\vec x) \\
P_{\neg\varphi}(\vec x) \land P_\varphi(\vec x) &amp; \vdash \bot
\end{align}\]</p>
<p>|\Phi| is the set of all these sequents. (For the basic sentence version, |\Phi| is the set of universal
closures of all these formulas for all |\varphi,\psi \in \mathcal F|.)</p>
<p>Another straightforward structural induction over the subformulas of |\varphi\in\mathcal F| shows that
\[N^\flat \vDash \varphi(\vec a) \iff N \vDash P_\varphi(\vec a)\] for any |L’|-structure |N|
which is a model of |\Phi|. The only interesting case is the negation case. Here, the induction hypothesis
states that |N^\flat\vDash\varphi(\vec a)| agrees with |N\vDash P_\varphi(\vec a)| and the axioms
state that |N\vDash P_{\neg\varphi}(\vec a)| is the complement of the latter which thus agrees with the
complement of the former which is |N^\flat\vDash\neg\varphi(\vec a)|.</p>
<p>From this, it follows that |N = M^\sharp| for |M = N^\flat| or, equivalently, |N = (N^\flat)^\sharp|.</p>
<p>|({-})^\sharp| and |({-})^\flat| thus establish a bijection between the objects of
|\mathrm{Mod}^{(\mathcal F)}(\mathbb T)| (or |\mathrm{Mod}^{(\mathcal F)}(\sigma)|) and
|\mathrm{Mod}^{(L’)}(\mathbb T’\cup\Phi))| (or |\mathrm{Mod}^{(L’)}(\bigwedge(\{\rho\}\cup\Phi))|).
The morphisms of these two categories would each be subclasses of the morphisms of |Str(L_0)| where |L_0| is
the language consisting of only the sorts and function symbols of |L| and thus |L’|. We can show that they
are identical subclasses which basically comes down to showing that an elementary mapping of
|\mathrm{Mod}^{(L’)}(\mathbb T’\cup\Phi))| (or |\mathrm{Mod}^{(L’)}(\bigwedge(\{\rho\}\cup\Phi))|)
is an |\mathcal F|-elementary mapping.</p>
<p>The idea is that such a morphism is a map |h : N \to N’| in |Str(L_0)| which must satisfy
\[N \vDash P_\varphi(\vec a) \implies N’ \vDash P_\varphi(h(\vec a))\] for
all |\varphi \in \mathcal F| and appropriate |\vec a|. However, since |N = (N^\flat)^\sharp|
and |P_\varphi(\vec a)^\flat = \varphi(\vec a)|, we have |N^\flat \vDash \varphi(\vec a) \iff N \vDash P_\varphi(\vec a)|
and similarly for |N’|. Thus
\[N^\flat \vDash \varphi(\vec a) \implies N’^\flat \vDash \varphi(h(\vec a))\] for all
|\varphi \in \mathcal F|, and every such |h| corresponds to an |\mathcal F|-elementary mapping.
Choosing |N = M^\sharp| allows us to show the converse for any |\mathcal F|-elementary
mapping |g : M \to M’|. |\square|</p>
<h3 id="commentary">Commentary</h3>
<p>The proof doesn’t particularly care that we’re interpreting the models into |\mathbf{Set}| and would
work just as well if we interpreted into some other category with the necessary structure. The amount
of structure required would vary with how much “infinitariness” we actually used, though it would need
to be a Boolean category. In particular, the proof works as stated (in its theory form) without
any infinitary connectives being implied for mapping finitary classical first-order logic to coherent logic.</p>
<p>We could simplify the statement and the proof by first eliminating |\forall| and |\to| and then
considering the proof over classical first-order logic with the connectives
|\{\bigwedge,\bigvee,\exists,\neg\}|. This would simplify the definition of fragment and
remove some cases in the proof.</p>
<p>To reiterate, <strong>the key is how we handle negation</strong>.</p>
<h2 id="defunctionalization">Defunctionalization</h2>
<p>Morleyization is related to <a href="https://en.wikipedia.org/wiki/Defunctionalization">defunctionalization</a><a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>.
For simplicity, I’ll only consider the finitary, propositional case, i.e. |L_{\omega,\varnothing}|.</p>
<p>In this case, we can consider each |P_\varphi| to be a new data type. In most cases, it would be
a <code>newtype</code> to use Haskell terminology. The only non-trivial case is |P_{\neg\varphi}|. Now, the
computational interpretation of classical propositional logic would use control operators to handle
negation. Propositional coherent logic, however, has a straightforward (first-order) functional
interpretation. Here, a negated formula, |\neg\varphi|, is represented by an primitive type
|P_{\neg\varphi}|.</p>
<p>The |P_{\neg\varphi} \land P_\varphi \vdash \bot| sequent is the <code>apply</code>
function for the defunctionalized continuation (of type |\varphi|). Even more clearly, this
is interderivable with |P_{\neg\varphi} \land \varphi’ \vdash \bot| where |\varphi’| is
the same as |\varphi| except the most shallow negated subformulas are replaced with the corresponding
predicate symbols. In particular, if |\varphi| contains no negated subformulas, then |\varphi’=\varphi|.
We have no way of creating new values of |P_{\neg\varphi}| other than via whatever sequents have been given.
We can, potentially, get a value of |P_{\neg\varphi}| by case analyzing on |\vdash \mathsf{lem}_\varphi : P_{\neg\varphi}\lor P_\varphi|.</p>
<p>What this corresponds to is a first-order functional language with a primitive type for each negated formula.
Any semantics/implementation for this, will need to decide if the primitive type |P_{\neg\varphi}| is
empty or not, and then implement |\mathsf{lem}_\varphi| appropriately (or allow inconsistency). A
programmer writing a program in this signature, however, cannot assume either way whether |P_{\neg\varphi}|
is empty unless they can create a program with that type.</p>
<p>As a very slightly non-trivial example, let’s consider implementing |A \to P_{\neg\neg A}|
corresponding to double negating. Using Haskell-like syntax, the program looks like:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">proof ::</span> <span class="dt">A</span> <span class="ot">-&gt;</span> <span class="dt">NotNotA</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>proof a <span class="ot">=</span> <span class="kw">case</span> lem_NotA <span class="kw">of</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>            <span class="dt">Left</span> notNotA <span class="ot">-&gt;</span> notNotA</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>            <span class="dt">Right</span> notA <span class="ot">-&gt;</span> absurd (apply_NotA (notA, a))</span></code></pre></div>
<p>where <code>lem_NotA :: Either NotNotA NotA</code>, <code>apply_NotA :: (NotA, A) -&gt; Void</code>, and <code>absurd :: Void -&gt; a</code>
is the eliminator for |\bot| where |\bot| is represented by <code>Void</code>.</p>
<p>Normally in defunctionalization we’d also be adding constructors to our new types for all the
occurrences of lambdas (or maybe |\mu|s would be better in this case). However, since the only
thing we can do (in general) with <code>NotA</code> is use <code>apply_A</code> on it, no information can be extracted
from it. Either it’s inhabited and behaves like <code>()</code>, i.e. |\top|, or it’s not inhabited and
behaves like <code>Void</code>, i.e. |\bot|. We can even test for this by case analyzing on <code>lem_A</code> which
makes sense because in the classical logic this formula was decidable.</p>
<h2 id="bonus-grothendieck-toposes-as-categories-of-models-of-sketches">Bonus: Grothendieck toposes as categories of models of sketches</h2>
<p>The main point of this section of “Accessible Categories” is to show that we can equivalently
view categories of models of <a href="https://ncatlab.org/nlab/show/sketch">sketches</a>
as categories of models of theories. In particular, models of <strong>geometric sketches</strong>, those whose
cone diagrams are finite but cocone diagrams are arbitrary, correspond to models of <a href="https://ncatlab.org/nlab/show/geometric+theory">geometric theories</a>.</p>
<p>We can view a <a href="https://ncatlab.org/nlab/show/site">site</a>, |(\mathcal C, J)|, for a Grothendieck topos as the
data of a geometric sketch. In particular, |\mathcal C| becomes the underlying category of the sketch, we
add cones to capture all finite limits, and the coverage, |J|, specifies the cocones. These cocones
have a particular form as the <a href="https://ncatlab.org/nlab/show/%CE%BA-ary+exact+category#sinks_and_relations">quotient of the kernel of a sink</a>
as specified by the sieves in |J|. (We need to use the apex of the cones representing pullbacks instead
of actual pullbacks.)</p>
<p>Lemma 3.2.2 shows the sketch-to-theory implication. The main thing I want to note about its proof is that
it illustrates how infinitely large cones would require infinitary (universal) quantification (in addition
to the unsurprising need for infinitary conjunction), but infinitely large cocones <em>do not</em> (but they do
require infinitary disjunction). I’ll not reproduce it here, but it comes down to writing out the normal
set-theoretic constructions of limits and colimits (in |\mathbf{Set}|), but instead of using some first-order
theory of sets, like <strong>ZFC</strong>, uses of sets would be replaced with (infinitary) logical operations. The
“infinite tuples” of an infinite limit become universal quantification over an infinitely large number of
free variables. For the colimits, though, the most complex use of quantifiers is an infinite disjunction of
increasingly deeply nested quantifiers to represent the transitive closure of a relation, but no single
disjunct is infinitary. Figuring out the infinitary formulas is a good exercise.</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>An
even more direct connection to defunctionalization is the fact that geometric logic is the internal logic
of Grothendieck toposes, but Grothendieck toposes are elementary toposes and so have the structure to model
implication and universal quantification. It’s just that those connectives aren’t preserved by geometric
morphisms. For implication, the idea is that |A \to B| is represented by
|\bigvee\{\bigwedge\Gamma\mid \Gamma,A\vdash B\}| where |\Gamma| is finite. We can even see how
a homomorphism that preserved geometric logic structure will fail to preserve this definition of |\to|.
Specifically, there could be additional contexts not in the image of the homomorphism that <em>should</em> be
included in the image of the disjunction for it to lead to |\to| in the target but won’t be.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>2024-07-18 19:35:18-07:00</pubDate>
    <guid>https://www.hedonisticlearning.com/posts/morleyization.html</guid>
    <dc:creator>Derek Elkins</dc:creator>
</item>
<item>
    <title>The Pullback Lemma in Gory Detail (Redux)</title>
    <link>https://www.hedonisticlearning.com/posts/the-pullback-lemma-in-gory-detail-redux.html</link>
    <description><![CDATA[<h2 id="introduction">Introduction</h2>
<p>Andrej Bauer has a paper titled <a href="http://math.andrej.com/wp-content/uploads/2012/05/pullback.pdf">The pullback lemma in gory detail</a>
that goes over the proof of the <a href="https://ncatlab.org/nlab/show/pasting+law+for+pullbacks">pullback lemma</a>
in full detail. This is a basic result of category theory and most introductions leave it as an exercise.
It is a good exercise, and you should prove it yourself before reading this article or Andrej Bauer’s.</p>
<p>Andrej Bauer’s proof is what most introductions are expecting you to produce.
I very much like the <a href="styles-of-category-theory.html">representability perspective</a> on category theory
and like to see what proofs look like using this perspective.</p>
<p>So this is a proof of the pullback lemma from the perspective of representability.</p>
<h2 id="preliminaries">Preliminaries</h2>
<p>The key thing we need here is a characterization of pullbacks in terms of representability. To
just jump to the end, we have for |f : A \to C| and |g : B \to C|, |A \times_{f,g} B| is <strong>the
pullback of |f| and |g|</strong> if and only if it represents the functor
\[\{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \}\]</p>
<p>That is to say we have the natural isomorphism \[
\mathrm{Hom}({-}, A \times_{f,g} B) \cong
\{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \}
\]</p>
<p>We’ll write the left to right direction of the isomorphism as |\langle u,v\rangle : U \to A \times_{f,g} B|
where |u : U \to A| and |v : U \to B| and they satisfy |f \circ u = g \circ v|. Applying
the isomorphism right to left on the identity arrow gives us two arrows |p_1 : A \times_{f,g} B \to A|
and |p_2 : A \times_{f,g} B \to B| satisfying |p_1 \circ \langle u, v\rangle = u| and
|p_2 \circ \langle u,v \rangle = v|. (Exercise: Show that this follows from being a <em>natural</em> isomorphism.)</p>
<p>One nice thing about representability is that it reduces categorical reasoning to set-theoretic
reasoning that you are probably already used to, as we’ll see. You can connect this definition
to a typical universal property based definition used in Andrej Bauer’s article. Here we’re taking
it as the definition of the pullback.</p>
<h2 id="proof">Proof</h2>
<p>The claim to be proven is if the right square in the below diagram is a pullback square, then the left
square is a pullback square if and only if the whole rectangle is a pullback square.
\[
\xymatrix {
A \ar[d]_{q_1} \ar[r]^{q_2} &amp; B \ar[d]_{p_1} \ar[r]^{p_2} &amp; C \ar[d]^{h} \\
X \ar[r]_{f} &amp; Y \ar[r]_{g} &amp; Z
}\]</p>
<p>Rewriting the diagram as equations, we have:</p>
<p><strong>Theorem</strong>: If |f \circ q_1 = p_1 \circ q_2|, |g \circ p_1 = h \circ p_2|, and |(B, p_1, p_2)| is a
pullback of |g| and |h|, then |(A, q_1, q_2)| is a pullback of |f| and |p_1| if and only if
|(A, q_1, p_2 \circ q_2)| is a pullback of |g \circ f| and |h|.</p>
<p><strong>Proof</strong>: If |(A, q_1, q_2)| was a pullback of |f| and |p_1| then we’d have the following.</p>
<p>\[\begin{align}
\mathrm{Hom}({-}, A)
&amp; \cong \{(u_1, u_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, B) \mid f \circ u_1 = p_1 \circ u_2 \} \\
&amp; \cong \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = p_1 \circ \langle v_1, v_2\rangle \land g \circ v_1 = h \circ v_2 \} \\
&amp; = \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = v_1 \land g \circ v_1 = h \circ v_2 \} \\
&amp; = \{(u_1, v_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, C) \mid g \circ f \circ u_1 = h \circ v_2 \}
\end{align}\]</p>
<p>The second isomorphism is |B| being a pullback and |u_2| is an arrow into |B| so it’s necessarily
of the form |\langle v_1, v_2\rangle|. The first equality is just |p_1 \circ \langle v_1, v_2\rangle = v_1|
mentioned earlier. The second equality merely eliminates the use of |v_1| using the equation |f \circ u_1 = v_1|.</p>
<p>This overall natural isomorphism, however, is exactly what it means for |A| to be a pullback
of |g \circ f| and |h|. We verify the projections are what we expect by pushing |id_A| through
the isomorphism. By assumption, |u_1| and |u_2| will be |q_1| and |q_2| respectively in the first isomorphism.
We see that |v_2 = p_2 \circ \langle v_1, v_2\rangle = p_2 \circ q_2|.</p>
<p>We simply run the isomorphism backwards to get the other direction of the if and only if. |\square|</p>
<p>The simplicity and compactness of this proof demonstrates why I like representability.</p>]]></description>
    <pubDate>2024-01-14 17:33:54-08:00</pubDate>
    <guid>https://www.hedonisticlearning.com/posts/the-pullback-lemma-in-gory-detail-redux.html</guid>
    <dc:creator>Derek Elkins</dc:creator>
</item>
<item>
    <title>Universal Quantification and Infinite Conjunction</title>
    <link>https://www.hedonisticlearning.com/posts/universal-quantification-and-infinite-conjunction.html</link>
    <description><![CDATA[<script>
extraMacros = {
  den: ["{[\\\![#1]\\\!]}", 1],
  bigden: ["{\\left[\\\!\\\!\\left[#1\\right]\\\!\\\!\\right]}", 1]
};
</script>
<h3 id="introduction">Introduction</h3>
<p>It is not uncommon for universal quantification to be described as
(potentially) infinite conjunction<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>.
Quoting Wikipedia’s <a href="https://en.wikipedia.org/w/index.php?title=Quantifier_(logic)&amp;oldid=1060503100#Relations_to_logical_conjunction_and_disjunction">Quantifier_(logic)</a>
page (my emphasis):</p>
<blockquote>
<p>For a finite domain of discourse |D = \{a_1,\dots,a_n\}|, the universal quantifier is equivalent to a logical conjunction of propositions with singular terms |a_i| (having the form |Pa_i| for monadic predicates).</p>
<p>The existential quantifier is equivalent to a logical disjunction of propositions having the same structure as before. <strong>For infinite domains of discourse, the equivalences are similar.</strong></p>
</blockquote>
<p>While there’s a small grain of truth
to this, I think it is wrong and/or misleading far more often than
it’s useful or correct. Indeed, it takes a bit of effort to even
get a statement that makes sense at all. There’s a bit of conflation
between syntax and semantics that’s required to have it naively
make sense, unless you’re working (quite unusually) in an infinitary
logic where it is typically outright false.</p>
<p>What harm does this confusion do? The most obvious harm is that
this view does not generalize to non-classical logics. I’ll focus
on constructive logics, in particular. Besides causing problems in
these contexts, which maybe you think you don’t care about, it betrays
a significant gap in understanding of what universal quantification
actually is. Even in purely classical contexts, this confusion often
manifests, e.g., in <a href="https://math.stackexchange.com/questions/110635/how-it-is-posible-that-omega-inconsistency-does-not-lead-to-inconsistency">confusion about |\omega|-inconsistency</a>.</p>
<p>So what is the difference between universal quantification and
infinite conjunction? Well, the most obvious difference is that
infinite conjunction is indexed by some (meta-theoretic) set that
doesn’t have anything to do with the domain the universal quantifier
quantifies over. However, even if these sets happened to coincide<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a> there are
still differences between universal quantification and infinite
conjunction. The key is that universal quantification
requires the predicate being quantified over to hold <em>uniformly</em>,
while infinite conjunction does not. It just so happens that for
the standard set-theoretic semantics of classical first-order logic
this “uniformity” constraint is degenerate. However, even for
classical first-order logic, this notion of uniformity will be
relevant.</p>
<!--more-->
<h3 id="classical-semantic-view">Classical Semantic View</h3>
<p>I want to start in the context where this identification is closest
to being true, so I can show where the idea comes from. The summary
of this section is that the standard, classical, set-theoretic
semantics of universal quantification is equivalent to an infinitary
generalization of the semantics of conjunction. The issue is
“infinitary generalization of the semantics of conjunction” isn’t
the same as “semantics of infinitary conjunction”.</p>
<p>The standard set-theoretic semantics of classical first-order logic
interprets each formula, |\varphi|, as a subset of |D^{\mathsf{fv}(\varphi)}|
where |D| is a given domain set and |\mathsf{fv}| computes the (necessarily finite)
set of free variables of |\varphi|. Traditionally, |D^{\mathsf{fv}(\varphi)}|
would be identified with |D^n| where |n| is the cardinality of |\mathsf{fv}(\varphi)|.
This involves an arbitrary mapping of the free variables of |\varphi|
to the numbers |1| to |n|. The semantics of a formula then becomes an |n|-ary
set-theoretic relation.</p>
<p>The interpretation of binary conjunction is straightforward:</p>
<p>\[\den{\varphi \land \psi} = \den{\varphi} \cap \den{\psi}\]</p>
<p>where |\den{\varphi}| stands for the interpretation of the
formula |\varphi|. To be even
more explicit, I should index this notation by a structure which specifies
the domain, |D|, as well as the interpretations of any predicate or function
symbols, but we’ll just consider this fixed but unspecified.</p>
<p>The interpretation of universal quantification is more complicated
but still fairly straightforward:</p>
<p>\[\den{\forall x.\varphi} = \bigcap_{d \in D}\left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\}\]</p>
<p>Set-theoretically, we have:</p>
<p>\[\begin{align}
\bar z \in \bigcap_{d \in D}\left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\}
\iff &amp; \forall d \in D. \bar z \in \left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\} \\
\iff &amp; \forall d \in D. \exists \bar y \in \den{\varphi}. \bar z = \bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \land \bar y(x) = d \\
\iff &amp; \forall d \in D. \bar z[x \mapsto d] \in \den{\varphi}
\end{align}\]</p>
<p>where |f[x \mapsto c]| extends a function |f \in D^{S}| to a function in |D^{S \cup \{x\}}|
via |f[x \mapsto c](v) = \begin{cases}c, &amp;\textrm{ if }v = x \\ f(v), &amp;\textrm{ if }v \neq x\end{cases}|.
The final |\iff| arises because |\bar z[x \mapsto d]| is the <em>unique</em> function which
extends |\bar z| to the desired domain such that |x| is mapped to |d|. Altogether, this
illustrates our desired semantics of the interpretation of |\forall x.\varphi| being the
interpretations of |\varphi| which hold when |x| is interpreted as any element of the domain.</p>
<p>This demonstrates the summary that the semantics of quantification is an infinitary
version of the semantics of conjunction, as |\bigcap| is an infinitary version of |\cap|.
But even here there are substantial cracks in this perspective.</p>
<h3 id="infinitary-logic">Infinitary Logic</h3>
<p>The first problem is that we don’t have an infinitary conjunction so saying
universal quantification is essentially infinitary conjunction doesn’t make sense.
However, it’s easy enough to formulate the syntax and semantics of infinitary
conjunction (assuming we have a meta-theoretic notion of sets).</p>
<p>Syntactically, for a (meta-theoretic) set |I| and an |I|-indexed family of formulas
|\{\varphi_i\}_{i \in I}|, we have the infinitary conjunction |\bigwedge_{i \in I} \varphi_i|.</p>
<p>The set-theoretic semantics of this connective <em>is</em> a direct generalization of the
binary conjunction case:</p>
<p>\[\bigden{\bigwedge_{i \in I}\varphi_i} = \bigcap_{i \in I}\den{\varphi_i}\]</p>
<p>If |I = \{1,2\}|, we recover exactly the binary conjunction case.</p>
<p>Equipped with a semantics of <em>actual</em> infinite conjunction, we can compare
to the semantics of universal quantification case and see where things go wrong.</p>
<p>The first problem is that it makes no sense to choose |I| to be |D|. The formula
|\bigwedge_{i \in I} \varphi_i| can be interpreted with respect to many
different domains. So any particular choice of |D| would be wrong for most semantics.
This is assuming that our syntax’s meta-theoretic sets were the same as our
semantics’ meta-theoretic sets, which need not be the case at all<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>.</p>
<p>An even bigger problem is that infinitary conjunction expects a family of formulas
while with universal quantification has just one. This is one facet of the uniformity
I mentioned. Universal quantification has one formula that is interpreted a single
way (with respect to the given structure). The infinitary intersection expression
is computing a set out of this singular interpretation. Infinitary conjunction, on
the other hand, has a family of formulas which need have no relation to each other.
Each of these formulas is independently interpreted and then all those separate
interpretations are combined with an infinitary intersection. The problem we have
is that there’s generally no way to take a formula |\varphi| with free variable |x|
and an element |d \in D| and make a formula |\varphi_d| with |x| not free such that
|\bar y[x \mapsto d] \in \den{\varphi} \iff \bar y \in \den{\varphi_d}|. A simple
cardinality argument shows that: there are only countably many (finitary) formulas,
but there are plenty of uncountable domains. This is why |\omega|-inconsistency is
possible. We can easily have elements in the domain which cannot be captured by any
formula.</p>
<h3 id="syntactic-view">Syntactic View</h3>
<p>Instead of taking a semantic view, let’s take a syntactic view of universal
quantification and infinitary conjunction, i.e. let’s compare the rules that
characterize them. As before, the first problem we have is that traditional
first-order logic does not have infinitary conjunction, but we can easily
formulate what the rules would be.</p>
<p>The elimination rules are superficially similar but have subtle but important
distinctions:</p>
<p>\[\frac{\Gamma \vdash \forall x.\varphi}{\Gamma \vdash \varphi[x \mapsto t]}\forall E,t
\qquad \frac{\Gamma \vdash \bigwedge_{i \in I} \varphi_i}{\Gamma \vdash \varphi_j}{\wedge}E,j\]
where |t| is a term, |j| is an element of |I|, and |\varphi[x \mapsto t]| corresponds
to syntactically substituting |t| for |x| in |\varphi| in a capture-avoiding way. A first,
not-so-subtle distinction is if |I| is an infinite set, then |\bigwedge_{i \in I}\varphi_i|
is an infinitely large formula. Another pretty obvious issue is universal quantification is
restricted to instantiating terms while |I| stands for either an <em>arbitrary</em> (meta-theoretic)
set or it may stand for some particular (meta-theoretic) set, e.g. |\mathbb N|. Either
way, it is typically not the set of terms of the logic.</p>
<p>Arguably, this isn’t an issue since the claim isn’t that every infinite conjunction
corresponds to a universal quantification, but only that universal quantification
corresponds to some infinite conjunction. The set of terms is a possible choice for
|I|, so that shouldn’t be a problem. Well, whether it’s a problem or not depends on
how you set up the syntax of the language. In my preferred way of handling the syntax
of logical formulas, I index each formula by the set of free variables that may occur
in that formula. This means the set of terms varies with the set of possible free
variables. Writing |\vdash_V \varphi| to mean |\varphi| is well-formed and provable
in a context with free variables |V|, then we would want the following rule:</p>
<p>\[\frac{\vdash_V \varphi}{\vdash_U \varphi}\] where |V \subseteq U|. This
simply states that if a formula is provable, it should remain provable even if we
add more (unused) free variables. This causes a problem with having an infinitary
conjunction indexed by terms. Writing |\mathsf{Term}(V)| for the set of terms
with (potential) free variables in |V|, then while |\vdash_V \bigwedge_{t \in \mathsf{Term}(V)}\varphi_t|
might be okay, this would also lead to |\vdash_U \bigwedge_{t \in \mathsf{Term}(V)}\varphi_t| which
would also hold but would no longer correspond to universal quantification in
a context with free variables in |U|. This really makes a difference. For example,
for many theories, such as the usual presentation of <strong>ZFC</strong>, |\mathsf{Term}(\varnothing) = \varnothing|,
i.e. there are no closed terms. As such, |\vdash_\varnothing \forall x.\bot|
is neither provable (which we wouldn’t expect it to be) <em>nor refutable</em> without additional axioms. On the
other hand, |\bigwedge_{i \in \varnothing}\bot| is |\top| and thus trivially
provable. If we consider |\vdash_{\{y\}} \forall x.\bot| next, it becomes
refutable. This doesn’t contradict our earlier rule about adding free variables
because |\vdash_\varnothing \forall x.\bot| wasn’t provable and so the rule
says nothing. On the other hand, that rule does require |\vdash_{\{y\}} \bigwedge_{i \in \varnothing}\bot|
to be provable, and it is. Of course, it no longer corresponds to |\forall x.\bot|
with this set of free variables. The putative corresponding formula would be
|\bigwedge_{i \in \{y\}}\bot| which is indeed refutable.</p>
<p>With the setup above, we can’t get the elimination rule for |\bigwedge| to
correspond to the elimination rule for |\forall|, because there isn’t a
singular set of terms. However, a more common if less clean approach is to allow
all free variables all the time, i.e. to fix a single countably infinite
set of variables once and for all. This would “resolve” this problem.</p>
<p>The differences in the introduction rules are more stark. The rules are:</p>
<p>\[\frac{\Gamma \vdash \varphi \quad x\textrm{ not free in }\Gamma}{\Gamma \vdash \forall x.\varphi}\forall I
\qquad \frac{\left\{\Gamma \vdash \varphi_i \right\}_{i \in I}}{\Gamma \vdash \bigwedge_{i \in I}\varphi_i}{\wedge}I\]</p>
<p>Again, the most blatant difference is that (when |I| is infinite) |{\wedge}I|
corresponds to an infinitely large derivation. Again, the uniformity aspects
show through. |\forall I| requires a single derivation that will handle all
terms, whereas |{\wedge}I| allows a different derivation for each |i \in I|.</p>
<p>We don’t run into the same issue as in the semantic view with needing to turn
elements of the domain into terms/formulas. Given a formula |\varphi| with
free variable |x|, we can easily make a formula |\varphi_t| for every
term |t|, namely |\varphi_t = \varphi[x \mapsto t]|. We won’t have the
issue that leads to |\omega|-inconsistency because |\forall x.\varphi|
is derivable from |\bigwedge_{t \in \mathsf{Term}(V)}\varphi[x \mapsto t]|.
Of course, the reason this is true is because one of the terms in |\mathsf{Term}(V)|
will be a variable not occurring in |\Gamma| allowing us to derive the
premise of |\forall I|. On the other hand, if we choose |I = \mathsf{Term}(\varnothing)|,
i.e. only consider closed terms, which is what <a href="https://en.wikipedia.org/wiki/%CE%A9-consistent_theory#%CF%89-logic">the |\omega| rule in arithmetic</a>
is doing, then we definitely can get |\omega|-inconsistency-like situations.
Most notably, in the case of theories, like <strong>ZFC</strong>, which have <em>no</em> closed
terms.</p>
<h3 id="constructive-view">Constructive View</h3>
<p>A constructive perspective allows us to accentuate the contrast between
universal quantification and infinitary conjunction even more as well
as bring more clarity to the notion of uniformity.</p>
<p>We’ll start with the <a href="https://en.wikipedia.org/wiki/Brouwer%E2%80%93Heyting%E2%80%93Kolmogorov_interpretation">BHK interpretation</a>
of Intuitionistic logic and specifically a <a href="https://en.wikipedia.org/wiki/Realizability">realizabilty</a>
interpretation. For this, we’ll allow infinitary conjunction only for |I = \mathbb N|.</p>
<p>I’ll write |n\textbf{ realizes }\varphi| for the statement that the natural number |n|
realizes the formula |\varphi|. As in the linked articles, we’ll need a computable <a href="https://en.wikipedia.org/wiki/Pairing_function">pairing function</a>
which computably encodes a pair of natural numbers as a natural number. I’ll just write
this using normal pairing notation, i.e. |(n,m)|. We’ll also need Gödel numbering
to computably map a natural number |n| to a computable function |f_n|.</p>
<p>\[\begin{align}
(n_0, n_1)\textbf{ realizes }\varphi_1 \land \varphi_2
\quad &amp; \textrm{if and only if} \quad n_0\textbf{ realizes }\varphi_0\textrm{ and }
n_1\textbf{ realizes }\varphi_1 \\
n\textbf{ realizes }\forall x.\varphi
\quad &amp; \textrm{if and only if}\quad \textrm{for all }m, f_n(m)\textbf{ realizes }\varphi[x \mapsto m] \\
(k, n_k)\textbf{ realizes }\varphi_1 \lor \varphi_2
\quad &amp; \textrm{if and only if} \quad k \in \{0, 1\}\textrm{ and }n_k\textbf{ realizes }\varphi_k \\
n\textbf{ realizes }\neg\varphi
\quad &amp; \textrm{if and only if} \quad\textrm{there is no }m\textrm{ such that }m\textbf{ realizes }\varphi
\end{align}\]</p>
<p>I included disjunction and negation in the above so I could talk about the Law of the Excluded Middle.
Via the above interpretation, given any formula |\varphi| with free variable |x|,
the meaning of |\forall x.\varphi\lor\neg\varphi| would be a computable function
which for each natural number |m| produces a bit indicating whether or not
|\varphi[x \mapsto m]| holds. The Law of Excluded Middle holding would thus mean
every such formula is computationally decidable which we know isn’t the case. For example,
choose |\varphi| as the formula which asserts that the |x|-th Turing machine halts.</p>
<p>This example illustrates the uniformity constraint. Assuming a traditional, classical
meta-language, e.g. <strong>ZFC</strong>, then it is the case that |(\varphi\lor\neg\varphi)[x \mapsto m]| is
realized for each |m| in the case where |\varphi| is asserting the halting of the |x|-th
Turing machine<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>. But this interpretation of universal quantification
requires not only that the quantified formula holds for all naturals, but also that
we can computably find this out.</p>
<p>It’s clear that trying to formulate a notion of infinitary conjunction with regards
to realizability would require using something other than natural numbers as realizers
if we just directly generalize the finite conjunction case. For example, we might
use potentially infinite sequences of natural numbers as realizers. Regardless,
the discussion of the previous example makes it clear an interpretation of infinitary
conjunction can’t be done in standard computability<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>,
while, obviously, universal quantification can.</p>
<h3 id="categorical-view">Categorical View</h3>
<p>The categorical semantics of universal quantification and conjunction are quite different
which also suggests that they are not related, at least not in some straightforward way.</p>
<p>One way to get to categorical semantics is to restate traditional, set-theoretic semantics
in categorical terms. Traditionally, the semantics of a formula is a subset
of some product of the domain set, one for each free variable. Categorically, that
suggests we want finite products and the categorical semantics of a formula should
be a subobject of a product of some object representing the domain.</p>
<p>Conjunction is traditionally represented via intersection of subsets, and categorically
we form the intersection of subobjects via pulling back. So to support finite conjunctions,
we need our category to additionally have finite pullbacks of monomorphisms. Infinitary
conjunctions simply require infinitely wide pullbacks of monomorphisms. However, we
can start to see some cracks here. What does it mean for a pullback to be infinitely wide?
It means the obvious thing; namely, that we have an infinite set of monomorphisms sharing
a codomain, and we’ll take the limit of this diagram. The key here, though, is “<em>set</em>”.
Regardless of whatever the objects of our semantic category are, the infinitary conjunctions
are indexed by a set.</p>
<p>To talk about the categorical semantics of universal quantification, we need to bring to
the foreground some structure that we have been leaving – and traditionally accounts do leave
– in the background. Before, I said the semantics of a formula, |\varphi|, depends on the
free variables in that formula, e.g. if |D| is our domain object, then the semantics of
a formula with three free variables would be a subobject of |\prod_{v \in \mathsf{fv}(\varphi)}D \cong D\times D \times D|
which I’ll continue to write as |D^{\mathsf{fv}(\varphi)}| though now it will be
interpreted as a product rather than a function space, i.e. we interpret this notation
as a <a href="https://ncatlab.org/nlab/show/powering">power</a>. For |\mathbf{Set}|, this makes no difference.
It would be more accurate to say that a formula can be given semantics in any product
of the domain object indexed by any <em>superset</em> of the free variables. This is just
to say that a formula doesn’t need to use every free variable that is available. Nevertheless,
even if it is induced by the same formula, a subobject of |D^{\mathsf{fv}(\varphi)}| is
a different subobject than a subobject of |D^{\mathsf{fv}(\varphi) \cup \{u\}}|
where |u| is a variable not free in |\varphi|, so we need a way of relating the semantics
of formulas considered with respect to different sets of free variables.</p>
<p>To do this, we will formulate a category of contexts and index our semantics by it.
Fix a category |\mathcal C| and an object |D| of |\mathcal C|. Our category
of contexts, |\mathsf{Ctx}|, will be the full subcategory of |\mathcal C| with
objects of the form |D^S| where |S| is a finite subset of |V|, a fixed set of
variables. We’ll assume these products exist, though typically we’ll just assume
that |\mathcal C| has <em>all</em> finite products. From here, we use the |\mathsf{Sub}| functor.
|\mathsf{Sub} : \mathsf{Ctx}^{op} \to \mathbf{Pos}| maps an object
of |\mathsf{Ctx}| to the poset of its subobjects as objects of |\mathcal C|<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>.
Now an arrow |f : D^{\{x,y,z,w\}} \to D^{\{x,y,z\}}| would induce a monotonic function
|\mathsf{Sub}(f) : \mathsf{Sub}(D^{\{x,y,z\}}) \to \mathsf{Sub}(D^{\{x,y,z,w\}})|.
This is defined for each subobject by pulling back a representative monomorphism of that subobject along |f|.
Arrows of |\mathsf{Ctx}| are the semantic analogues of substitutions, and |\mathsf{Sub}(f)|
applies these “substitutions” to the semantics of formulas.</p>
<p>Universal quantification is then characterized as the (indexed) right adjoint (Galois
connection in this context) of |\mathsf{Sub}(\pi^x)| where |\pi^x : D^S \to D^{S \setminus \{x\}}|
is just projection. The indexed nature of this adjoint leads to <a href="/posts/beck-chevalley.html">Beck-Chevalley conditions</a>
reflecting the fact universal quantification should respect substitution.
|\mathsf{Sub}(\pi^x)| corresponds to adding |x| as a new, unused free variable to a formula.
Let |U| be a subobject of |D^{S \setminus \{x\}}| and |V| a subobject of |D^S|. Furthermore, write
|U \sqsubseteq U’| to indicate that |U| is a subobject of the subobject |U’|, i.e.
that the monos that represent |U| factor through the monos that represent |U’|. The
adjunction then states: \[\mathsf{Sub}(\pi^x)(U) \sqsubseteq V\quad \textrm{if and only if}\quad U \sqsubseteq \forall_x(V)\]
The |\implies| direction is a fairly direct semantic analogue of the |\forall I| rule:
\[\frac{\Gamma \vdash \varphi\quad x\textrm{ not free in }\Gamma}{\Gamma \vdash \forall x.\varphi}\]
Indeed, it is easy to show that the converse of this rule is derivable with |\forall E|
validating the semantic “if and only if”. To be clear, the full adjunction is natural in
|U| and |V| and indexed, effectively, in |S|.</p>
<p>Incidentally, we’d also want the semantics of infinite conjunctions to respect substitution,
so they too have a Beck-Chevalley condition they satisfy and give rise to an indexed right adjoint.</p>
<p>It’s hard to even compare the categorical semantics of infinitary conjunction and universal
quantification, let alone conflate them, <em>even when |\mathcal C = \mathbf{Set}|</em>. This isn’t
too surprising as these semantics work just fine for constructive logics where, as illustrated
earlier, these can be semantically distinct. As mentioned, both of these constructs can be
described by indexed right adjoints. However, they are adjoints between very different indexed
categories. If |\mathcal M| is our indexed category (above it was |\mathsf{Sub}|), then we’ll
have |I|-indexed products if |\Delta_{\mathcal M} : \mathcal M \to [DI, -] \circ \mathcal M|
has an indexed right adjoint where |D : \mathbf{Set} \to \mathbf{cat}| is the discrete (small)
category functor. For |\mathcal M| to have universal quantification, we need an indexed right
adjoint to an indexed functor |\mathcal M \circ \mathsf{cod} \circ \iota \to \mathcal M \circ \mathsf{dom} \circ \iota|
where |\iota : s(\mathsf{Ctx}) \hookrightarrow \mathsf{Ctx}^{\to}| is the full subcategory
of the arrow category |\mathsf{Ctx}^{\to}| consisting of just the projections.</p>
<h3 id="conclusion">Conclusion</h3>
<p>My hope is that the preceding makes it abundantly clear that viewing universal quantification
as some kind of special “infinite conjunction” is not sensible even approximately. To do so is
to seriously misunderstand universal quantification. Most discussions “equating” them involve
significant conflations of syntax and semantics where a specific choice of domain is fixed and elements
of that specific domain are used as terms.</p>
<p>A secondary goal was to illustrate an aspect of logic from a variety of perspectives and illustrate
some of the concerns in meta-logical reasoning. For example, quantifiers and connectives are
syntactical concepts and thus can’t depend on the details of the semantic domain. As another
example, better perspectives on quantifiers and connectives are more robust to weakening the
logic. I’d say this is especially true when going from classical to constructive logic.
Structural proof theory and categorical semantics are good at formulating logical concepts
modularly so that they still make sense in very weak logics.</p>
<p>Unfortunately, the traditional trend towards minimalism strongly pushes in the
other direction leading to the exploiting of every symmetry and coincidence a
stronger logic (namely classical logic) provides producing definitions that
don’t survive even mild weakening of the logic<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>. The attempt to identify
universal quantification with infinite conjunction here takes that impulse too
far and doesn’t even work in classical logic as demonstrated. While there’s
certainly value in recognizing redundancy, I personally find minimizing logical
assumptions far more important and valuable than minimizing (primitive) logical
connectives.</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>
“Universal statements are true if they are true for every individual in the
world. They can be thought of as an infinite conjunction,” from <a href="https://www.massey.ac.nz/~mjjohnso/notes/59302/l07.html">some random AI
lecture notes</a>. Dan
Bernstein writes in <a href="https://cr.yp.to/papers.html#pwccp">Papers with computer-checked proofs</a>,
“Actually, logicians who are serious about Sensible Notation write ‘for all’ as
a big prefix |\bigwedge|. See, the semantics of ‘for all’ are really a big ‘and’ operation[.]”. You
can find many others.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>The domain
doesn’t even need to be a set.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>For example,
we may formulate our syntax in a second-order arithmetic identifying our syntax’s
meta-theoretic sets with unary predicates, while our semantics is in <strong>ZFC</strong>. Just
from cardinality concerns, we know that there’s no way of injectively mapping
every <strong>ZFC</strong> set to a set of natural numbers.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>It’s probably worth pointing out that not only will this classical meta-language
not tell us whether it’s |\varphi[x \mapsto m]| or |\neg\varphi[x \mapsto m]| that
holds for every specific |m|, but it’s easy to show (assuming consistency of <strong>ZFC</strong>)
that |\varphi[x \mapsto m]| is independent of <strong>ZFC</strong> for specific values of |m|.
For example, it’s easy to make a Turing machine that halts if and only if it finds
a contradiction in the theory of <strong>ZFC</strong>.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>Interestingly, for some models
of computation, e.g. ones based on Turing machines, infinitary disjunction, or, specifically,
|\mathbb N|-ary disjunction is <em>not</em> problematic. Given an infinite sequence of halting
Turing machines, we can interleave their execution such that every Turing machine in the
sequence will halt at some finite time. Accordingly, extending the definition of
disjunction in realizability to the |\mathbb N|-ary case does not run into any
of the issues that |\mathbb N|-ary conjunction has and is completely unproblematic.
We just let |k| be an arbitrary natural instead of just |\{0, 1\}|.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>This is a place we could generalize the
categorical semantics further. There’s no reason we need to consider this particular functor.
We could consider other functors from |\mathsf{Ctx}^{op} \to \mathbf{Pos}|, i.e. other
<a href="https://ncatlab.org/nlab/show/indexed+category">indexed</a> <a href="https://ncatlab.org/nlab/show/%280%2C1%29-category+theory">|(0,1)|-categories</a>.
This setup is called a <a href="https://ncatlab.org/nlab/show/hyperdoctrine">hyperdoctrine</a><a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>The most obvious example of
this is defining quantifiers and connectives in terms of other connectives
particularly when negation is involved. A less obvious example is the overwhelming
focus on |\mathbf 2|-valued semantics when classical logic naturally allows
arbitrary Boolean-algebra-valued semantics.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>2024-01-02 22:00:41-08:00</pubDate>
    <guid>https://www.hedonisticlearning.com/posts/universal-quantification-and-infinite-conjunction.html</guid>
    <dc:creator>Derek Elkins</dc:creator>
</item>
<item>
    <title>What is the coproduct of two groups?</title>
    <link>https://www.hedonisticlearning.com/posts/what-is-the-coproduct-of-two-groups.html</link>
    <description><![CDATA[<h2 id="introduction">Introduction</h2>
<p>The purpose of this article is to answer the question: what is the coproduct
of two groups? The approach, however, will be somewhat absurd. Instead of
simply presenting a construction and proving that it satisfies the appropriate
universal property, I want to find the general answer and simply instantiate
it for the case of groups.</p>
<p>Specifically, this will be a path through the theory of Lawvere theories and
their models with the goal of motivating some of the theory around it in
pursuit of the answer to this relatively simple question.</p>
<p>If you really just want to know the answer to the title question, then the
construction is usually called the <a href="https://en.wikipedia.org/wiki/Free_product">free product</a>
and is described on the linked Wikipedia page.</p>
<!--more-->
<h2 id="groups-as-models-of-a-lawvere-theory">Groups as Models of a Lawvere Theory</h2>
<p>A group is a model of an equational theory. This means a group is described by a set
equipped with a collection of operations that must satisfy some equations. So we’d
have a set, |G|, and operations |\mathtt{e} : 1 \to G|, |\mathtt{i} : G \to G|,
and |\mathtt{m} : G \times G \to G|. These operations satisfy the equations,
\[
\begin{align}
\mathtt{m}(\mathtt{m}(x, y), z) = \mathtt{m}(x, \mathtt{m}(y, z)) \\
\mathtt{m}(\mathtt{e}(), x) = x = \mathtt{m}(x, \mathtt{e}()) \\
\mathtt{m}(\mathtt{i}(x), x) = \mathtt{e}() = \mathtt{m}(x, \mathtt{i}(x))
\end{align}
\]
universally quantified over |x|, |y|, and |z|.</p>
<p>These equations can easily be represented by commutative diagrams, i.e. equations
of compositions of arrows, in any category with finite products of an object, |G|,
with itself. For example, the left inverse law becomes:
\[
\mathtt{m} \circ \langle\mathtt{i}, id_G\rangle = \mathtt{e} \circ {!}_G
\]
where |{!}_G : G \to 1| is the unique arrow into the terminal object corresponding
to the |0|-ary product of copies of |G| and |\langle f, g\rangle(x) = (f(x), g(x))|.</p>
<p>One nice thing about this categorical description is that we can now talk about
a group object in any category with finite products. Even better, we can make this
pattern describing what a group is first-class. The (<strong>Lawvere</strong>) <strong>theory of a
group</strong> is a (small) category, |\mathcal{T}_{\mathbf{Grp}}| whose objects are an object |\mathsf{G}| and
all its powers, |\mathsf{G}^n|, where |\mathsf{G}^0 = 1|
and |\mathsf{G}^{n+1} = \mathsf{G} \times \mathsf{G}^n|. The arrows consist
of the relevant projection and tupling operations, the three arrows above,
|\mathsf{m} : \mathsf{G}^2 \to \mathsf{G}^1|, |\mathsf{i} : \mathsf{G}^1 \to \mathsf{G}^1|,
|\mathsf{e} : \mathsf{G}^0 \to \mathsf{G}^1|, and all composites that could
be made with these arrows. See my <a href="/posts/category-theory-syntactically.html#finite-product-theories">previous article</a>
for a more explicit description of this, but it should be fairly intuitive.</p>
<p>An actual group is then, simply, a finite-product-preserving functor
|\mathcal{T}_{\mathbf{Grp}} \to \mathbf{Set}|. It must be finite-product-preserving
so the image of |\mathsf{m}| actually gets sent to a binary function
and not some function with some arbitrary domain. The category, |\mathbf{Grp}|, of
groups and group homomorphisms is equivalent to the category |\mathbf{Mod}_{\mathcal{T}_{\mathbf{Grp}}}|
which is defined to be the full subcategory of the category of functors from |\mathcal{T}_{\mathbf{Grp}} \to \mathbf{Set}|
consisting of the functors which preserve finite products. While we’ll not explore it more
here, we could use <em>any</em> category with finite products as the target, not just |\mathbf{Set}|.
For example, we’ll show that |\mathbf{Grp}| has finite products, and in fact all limits
and colimits, so we can talk about the models of the theory of groups in the
category of groups. This turns out to be equivalent to the category of Abelian
groups via the well-known <a href="https://en.wikipedia.org/wiki/Eckmann%E2%80%93Hilton_argument">Eckmann-Hilton argument</a>.</p>
<h2 id="a-bit-of-organization">A Bit of Organization</h2>
<p>First, a construction that will become even more useful later. Given any category, |\mathcal{C}|,
we define |\mathcal{C}^{\times}|, or, more precisely, an inclusion |\sigma : \mathcal{C} \hookrightarrow \mathcal{C}^{\times}|
to be the <strong>free category-with-finite-products generated from</strong> |\mathcal{C}|. Its universal property is:
given any functor |F : \mathcal{C} \to \mathcal{E}| into a category-with-finite-products |\mathcal E|,
there exists a unique finite-product-preserving functor |\bar{F} : \mathcal{C}^{\times} \to \mathcal E|
such that |F = \bar{F} \circ \sigma|.</p>
<p>An explicit construction of |\mathcal{C}^{\times}| is the following. Its objects consist
of (finite) lists of objects of |\mathcal{C}| with concatenation as the categorical product
and the empty list as the terminal object. The arrows are tuples with a component for each
object in the codomain list. Each component is a pair of an index into the domain list and
an arrow from the corresponding object in the domain list to the object in the codomain list
for this component. For example, the arrow |[A, B] \to [B, A]| would be |((1, id_B), (0, id_A))|.
The idea is that |((k_1, f_1), \dots, (k_n, f_n))| will be interpreted as
|\langle f_1 \circ \pi_{k_1}, \dots, f_n \circ \pi_{k_n}\rangle| where |\pi_{k_i}| is the
projection |k_i|-th component of the input.
Identity and composition is straightforward. |\sigma| then maps each object to a singleton
list and each arrow |f| to |((0, f))|.</p>
<p>Like most free constructions, this construction completely ignores any finite products the
original category may have had. In particular, we want the category
|\mathcal{T}_{\mathbf{Set}} = \mathbf{1}^{\times}|, called <strong>the theory of a set</strong>.
The fact that the one object of the category |\mathbf{1}| is terminal has nothing to do
with its image via |\sigma| which is not the terminal object.</p>
<p>We now define the general notion of a (<strong>Lawvere</strong>) <strong>theory</strong> as a small category
with finite products, |\mathcal{T}|, equipped with a finite-product-preserving, identity-on-objects
functor |\mathcal{T}_{\mathbf{Set}} \to \mathcal{T}|. A <strong>morphism of</strong> (<strong>Lawvere</strong>) <strong>theories</strong>
is a finite-product-preserving functor that preserves these inclusions a la:
\[
\xymatrix {
&amp; \mathcal{T}_{\mathbf{Set}} \ar[dl] \ar[dr] &amp; \\
\mathcal{T}_1 \ar[rr] &amp; &amp; \mathcal{T}_2
}
\]</p>
<p>The identity-on-objects aspect of the inclusion of |\mathcal{T}_{\mathbf{Set}}|
along with finite-product-preservation ensures that the only objects in |\mathcal{T}|
are powers of a single object which we’ll generically call |\mathsf{G}|. This is
sometimes called the “generic object”, though the term “generic object” has other
meanings in category theory. To be clear, if |F| is an identity-on-objects functor, we’re not just
saying |FX = X| for every object |X|, but that the object part of the functor is the identity
function, i.e. if |F : \mathcal C \to \mathcal D|, then |\mathcal C| and |\mathcal D| have
<em>exactly the same</em> objects.</p>
<p>A <strong>model of a theory</strong> (in |\mathbf{Set}|) is then simply a finite-product-preserving
functor into |\mathbf{Set}|. |\mathbf{Mod}_{\mathcal{T}}| is the full subcategory
of functors from |\mathcal{T} \to \mathbf{Set}| which preserve finite products.
The <strong>morphisms of models</strong> are simply the natural transformations. As an exercise,
you should show that for a natural transformation |\tau : M \to N| where |M|
and |N| are two models of the same theory, |\tau_{\mathsf{G}^n} = \tau_{\mathsf{G}}^n|.</p>
<h2 id="the-easy-categorical-constructions">The Easy Categorical Constructions</h2>
<p>This relatively simple definition of model already gives us a large swathe of results.
An easy result in basic category theory is that (co)limits in functor
categories are computed pointwise whenever the corresponding (co)limits exist
in the codomain category. In our case, |\mathbf{Set}| has all (co)limits, so
all categories of |\mathbf{Set}|-valued functors have all (co)limits and they
are computed pointwise.</p>
<p>However, the (co)limit of finite-product-preserving functors into |\mathbf{Set}|
may not be finite-product-preserving, so we don’t immediately get that |\mathbf{Mod}_{\mathcal{T}}|
has all (co)limits (and they are computed pointwise). That said, finite products
are limits and limits commute with each other, so we <em>do</em> get that |\mathbf{Mod}_{\mathcal{T}}|
has all limits and they are computed pointwise. Similarly, sifted colimits, which are
colimits that commute with finite products in |\mathbf{Set}| also exist and are
computed pointwise in |\mathbf{Mod}_{\mathcal{T}}|. Sifted colimits include the
better known filtered colimits which commute with all finite limits.</p>
<p>I’ll not elaborate on sifted colimits.
We’re here for (finite) coproducts, and, as you’ve probably already guessed,
coproducts are not sifted colimits.</p>
<h2 id="when-the-coproduct-of-groups-is-easy">When the Coproduct of Groups is Easy</h2>
<p>There is one class of groups whose coproduct is easy to compute for
general reasons: the free groups. The free group construction, like
most “free constructions”, is a left adjoint and left adjoints
preserve colimits, so the coproduct of two free groups is just the
free group on the coproduct, i.e. disjoint union, of their generating
sets. We haven’t defined the free group yet, though.</p>
<p>Normally, the free group construction would be defined as left
adjoint to the underlying set functor. We have a very straightforward
way to define the underlying set functor. Define |U : \mathbf{Mod}_{\mathcal T} \to \mathbf{Set}|
as |U(M) = M(\mathsf{G}^1)| and |U(\tau) = \tau_{\mathsf{G}^1}|.
Identifying |\mathsf{G}^1| with the functor |\mathsf G : \mathbf{1} \to \mathcal{T}|
we have |U(M) = M \circ \mathsf{G}| giving a functor |\mathbf{1} \to \mathbf{Set}|
which we identify with a set. The left adjoint to precomposition
by |\mathsf{G}| is the left Kan extension along |\mathsf{G}|.</p>
<p>We then compute |F(S) = \mathrm{Lan}_{\mathsf{G}}(S)
\cong \int^{{*} : \mathbf{1}} \mathcal{T}(\mathsf{G}({*}), {-}) \times S({*})
\cong \mathcal{T}(\mathsf{G}^1, {-}) \times S|. This is the left
Kan extension and does form an adjunction but <em>not</em> with the category
of models because the functor produced by |F(S)| does not preserve finite products.
We should have |F(S)(\mathsf{G}^n) \cong F(S)(\mathsf{G})^n|, but substituting
in the definition of |F(S)| clearly does not satisfy this. For example,
consider |F(\varnothing)(\mathsf{G}^0)|.</p>
<p>We can and will show that the left Kan extension of a functor into |\mathbf{Set}|
preserves finite products when the original functor did. Once
we have that result we can correct our definition of the free construction.
We simply replace |S : \mathbf{1} \to \mathbf{Set}| with a functor that
<em>does</em> preserve finite products, namely |\bar{S} : \mathbf{1}^{\times} \to \mathbf{Set}|.
Of course, |\mathbf{1}^{\times}| is exactly our definition of |\mathcal{T}_{\mathbf{Set}}|.
We see now that a model of |\mathcal{T}_{\mathbf{Set}}| is the same thing as having
a set, hence the name. Indeed, we have an equivalence of categories between |\mathbf{Set}|
and |\mathbf{Mod}_{\mathcal{T}_{\mathbf{Set}}}|. (More generally, this theory is called
“the theory of an object” as we may consider models in categories other than |\mathbf{Set}|,
and we’ll still have this relation.)</p>
<p>The correct definition of |F| is |F(S) = \mathrm{Lan}_{\iota}(\bar S)
\cong \int^{\mathsf{G}^n:\mathcal{T}_{\mathbf{Set}}} \mathcal{T}(\iota(\mathsf{G}^n), {-}) \times \bar{S}(\mathsf{G}^n)
\cong \int^{\mathsf{G}^n:\mathcal{T}_{\mathbf{Set}}} \mathcal{T}(\iota(\mathsf{G}^n), {-}) \times S^n|
where |\iota : \mathcal{T}_{\mathbf{Set}} \to \mathcal{T}| is the inclusion
we give as part of the definition of a theory. We can also see |\iota| as |\bar{\mathsf{G}}|.</p>
<p>We can start to see the term algebra in this definition. An element of |F(S)| is
a choice of |n|, an |n|-tuple of elements of |S|, and a (potentially compound) |n|-ary operation.
We can think of an element of |\mathcal{T}(\mathsf{G}^n, {-})| as a term with |n|
free variables which we’ll label with the elements of |S^n| in |F(S)|. The equivalence
relation in the explicit construction of the coend allows us to swap projections and
tupling morphisms from the term to the tuple of labels. For example, it equates a
unary term paired with one label with a binary term paired with two labels but where
the binary term immediately discards one of its inputs. Essentially, if you are
given a unary term and two labels, you can either discard one of the labels or you
can make the unary term binary by precomposing with a projection. Similarly for
tupling.</p>
<p>It’s still not obvious this definition produces a functor which preserves finite products.
As a lemma to help in the proof of that fact, we have a bit of <a href="https://arxiv.org/abs/1501.02503">coend calculus</a>.</p>
<p><strong>Lemma 1</strong>: Let |F \dashv U : \mathcal{D} \to \mathcal{C}| and |H : \mathcal D^{op} \times \mathcal{C} \to \mathcal{E}|.
Then, |\int^C H(FC, C) \cong \int^D H(D, UD)| when one, and thus both, exist. <strong>Proof</strong>:
\[
\begin{align}
\mathcal{E}\left(\int^C H(FC, C), {-}\right)
&amp; \cong \int_C \mathcal{E}(H(FC, C), {-}) \tag{continuity} \\
&amp; \cong \int_C \int_D [\mathcal{D}(FC, D), \mathcal{E}(H(D, C), {-})] \tag{Yoneda} \\
&amp; \cong \int_C \int_D [\mathcal{C}(C, UD), \mathcal{E}(H(D, C), {-})] \tag{adjunction} \\
&amp; \cong \int_D \int_C [\mathcal{C}(C, UD), \mathcal{E}(H(D, C), {-})] \tag{Fubini} \\
&amp; \cong \int_D \mathcal{E}(H(D, UD), {-}) \tag{Yoneda} \\
&amp; \cong \mathcal{E}\left(\int^D H(D, UD), {-}\right) \tag{continuity} \\
&amp; \square
\end{align}
\]</p>
<p>Using the adjunctions |\Delta \dashv \times : \mathcal{C} \times \mathcal{C}\to \mathcal{C}|
and |{!}_1 \dashv 1 : \mathbf{1} \to \mathcal{C}|, where we’re treating |1| as the functor
|\mathbf{1}\to\mathcal{C}| which picks out a terminal object of |\mathcal{C}|,
gives the following corollary.</p>
<p><strong>Corollary 2</strong>: For any |H : \mathcal{C}^{op} \times \mathcal{C}^{op} \times \mathcal{C} \to \mathcal{E}|,
\[\int^{C} H(C, C, C) \cong \int^{C_1}\int^{C_2} H(C_1, C_2, C_1 \times C_2)\] when
both exists and for any |H’ : \mathcal{C} \to\mathcal{E}|, |H’(1) \cong \int^C H’(C)|.
The former allows us to combine two (co)ends into one. The latter reproduces a standard result about
colimits over diagrams whose index category has a terminal object.</p>
<p>Now our theorem.</p>
<p><strong>Theorem 3</strong>: Let |F : \mathcal{T}_1 \to \mathbf{Set}| and |J : \mathcal{T}_1 \to \mathcal{T}_2|
where |\mathcal{T}_1| and |\mathcal{T}_2| have finite products. Then |\mathrm{Lan}_J(F)|
preserves finite products if |F| does.</p>
<p><strong>Proof</strong>:
\[
\begin{flalign}
\mathrm{Lan}_J(F)(X \times Y)
&amp; \cong \int^A \mathcal{T}_2(J(A), X \times Y) \times F(A) \tag{coend formula for left Kan extension} \\
&amp; \cong \int^A \mathcal{T}_2(J(A), X) \times \mathcal{T}_2(J(A), Y) \times F(A) \tag{continuity} \\
&amp; \cong \int^{A_1}\int^{A_2}\mathcal{T}_2(J(A_1), X) \times \mathcal{T}_2(J(A_2), Y) \times F(A_1 \times A_2) \tag{Corollary 2} \\
&amp; \cong \int^{A_1}\int^{A_2}\mathcal{T}_2(J(A_1), X) \times \mathcal{T}_2(J(A_2), Y) \times F(A_1) \times F(A_2) \tag{finite product preservation} \\
&amp; \cong \left(\int^{A_1}\mathcal{T}_2(J(A_1), X) \times F(A_1) \right) \times \left(\int^{A_2}\mathcal{T}_2(J(A_2), Y) \times F(A_2)\right) \tag{commutativity and cocontinuity of $\times$} \\
&amp; \cong \mathrm{Lan}_J(F)(X) \times \mathrm{Lan}_J(F)(Y) \tag{coend formula for left Kan extension}
\end{flalign}
\]
and for the <span class="math inline">0</span>-ary product case:
\[
\begin{flalign}
\mathrm{Lan}_J(F)(1)
&amp; \cong \int^A \mathcal{T}_2(J(A), 1) \times F(A) \tag{coend formula for left Kan extension} \\
&amp; \cong \int^A 1 \times F(A) \tag{continuity} \\
&amp; \cong 1 \times F(1) \tag{Corollary 2} \\
&amp; \cong 1 \times 1 \tag{finite product preservation} \\
&amp; \cong 1 \tag{1 is unit to $\times$}
\end{flalign}
\]
|\square|</p>
<h2 id="the-coproduct-of-groups">The Coproduct of Groups</h2>
<p>To get general coproducts (and all colimits), we’ll show that |\mathbf{Mod}_{\mathcal{T}}|
is a <a href="https://ncatlab.org/nlab/show/reflective+subcategory"><em>reflective</em> subcategory</a> of
|[\mathcal{T}, \mathbf{Set}]|. Write
|\iota : \mathbf{Mod}_{\mathcal{T}} \hookrightarrow [\mathcal{T}, \mathbf{Set}]|.
If we had a functor |R| such that |R \dashv \iota|, then |\iota| being full and faithful implies
|\varepsilon : R \circ \iota \cong Id| which allows us to quickly produce colimits
in the subcategory via |\int^I D(I) \cong R\int^I \iota D(I)|.
It’s easy to verify that |R\int^I \iota D(I)| has the appropriate universal property
to be |\int^I D(I)|.</p>
<p>We’ll compute |R| by composing two adjunctions. First,
we have |\bar{({-})} \dashv \iota({-}) \circ \sigma : \mathbf{Mod}_{\mathcal{T}^{\times}} \to [\mathcal T, \mathbf{Set}]|.
This is essentially the universal property of |\mathcal{T}^{\times}|.
When |\mathcal{T}| has finite products, which, of course, we’re assuming,
we can use the universal property of |\mathcal{T}^{\times}| to factor |Id_{\mathcal{T}}|
into |Id = \bar{Id} \circ \sigma|. The second adjunction is then |\mathrm{Lan}_{\bar{Id}} \dashv {-} \circ \bar{Id} : \mathbf{Mod}_{\mathcal{T}} \to \mathbf{Mod}_{\mathcal{T}^{\times}}|.
To verify that these are well-defined, i.e. they produce finite-product-preserving functors,
we argue as follows. The left adjoint sends finite-product-preserving functors to
finite-product-preserving functors via Theorem 3. The right adjoint is the composition of
finite-product-preserving functors.</p>
<p>The composite of the left adjoints is |\iota({-} \circ \bar{Id}) \circ \sigma = \iota({-}) \circ \bar{Id} \circ \sigma = \iota({-})|.
The composite of the right adjoint is
\[
\begin{align}
R(F)
&amp; = \mathrm{Lan}_{\bar{Id}}(\bar{F}) \\
&amp; \cong \int^X \mathcal{T}(\bar{Id}(X), {-}) \times \bar{F}(X) \\
&amp; \cong \int^X \mathcal{T}\left(\prod_{i=1}^{\lvert X\rvert} X_i, {-}\right) \times \prod_{i=1}^{\lvert X \rvert} F(X_i)
\end{align}
\] where we view the list |X : \mathcal{T}^{\times}| as a |\lvert X\rvert|-tuple with components |X_i|.</p>
<p>This construction of the reflector, |R|, is quite similar to the free construction.
The main difference is that here we factor |Id| via |\mathcal{T}^{\times}| where there we factored
|\mathsf{G} : \mathbf{1} \to \mathcal{T}| via |\mathbf{1}^{\times} = \mathcal{T}_{\mathbf{Set}}|.</p>
<p>Let’s now explicitly describe the coproducts via |R|. As a warm-up, we’ll consider the initial object, i.e.
nullary coproducts. We consider |R(\Delta 0)|. Because |0 \times S = 0|, the only case in the coend
that isn’t |0| is when |\lvert X \rvert = 0| so the underlying set of the coend reduces to
|\mathcal{T}(\mathsf{G}^0, \mathsf{G}^1)|, i.e. the nullary terms. For groups, this is just the unit
element. For bounded lattices, it would be the two element set consisting of the top and bottom elements.
For lattices without bounds, it would be the empty set. Of course, |R(\Delta 0)| matches |F(0)|, i.e.
the free model on |0|.</p>
<p>Next, we consider two models |G| and |H|. First, we compute to the coproduct of |G| and |H| as (plain)
functors which is just computed pointwise, i.e. |(G+H)(\mathsf{G}^n) = G(\mathsf{G}^n)+H(\mathsf{G}^n) \cong G(\mathsf{G^1})^n + H(\mathsf{G^1})^n|.
Considering the case where |X_i = \mathsf{G}^1| for all |i| and where |\lvert X \rvert = n|, which
subsumes all the other cases, we see we have a term with |n| free variables each labelled by either
an element of |G| or an element of |H|. If we normalized the term into a list of variables representing
a product of variables, then we’d have a essentially a <strong>word</strong> as described on <a href="https://en.wikipedia.org/wiki/Free_product#Construction">the Wikipedia page
for the free product</a>. If we then only considered
quotienting by the equivalences induced by projection and tupling, we’d have the free group on the
disjoint union of the underlying sets of the |G| and |H|. However, for |R|, we quotient also by
the action of the other operations. The lists of objects with |X_i \neq \mathsf{G}^1| come in here
to support equating non-unary ops. For example, a pair of the binary term |\mathsf{m}| and
the 2-tuple of elements |(g_1, g_2)| for |g_1, g_2 \in U(G)|, will be equated with the pair of
the unary term |id| and the 1-tuple of elements |(g)| where |g = g_1 g_2| in |G|. Similarly for
|H| and the other operations (and terms generally). Ultimately, the quotient identifies every
element with an element that consists of a pair of a term that is a fully right associated
set of multiplications ending in a unit where each variable is labelled with an element from
|U(G)| or |U(H)| in an alternating fashion. These are the <strong>reduced words</strong> in the Wikipedia
article.</p>
<p>This, perhaps combined with a more explicit spelling out of the equivalence relation, should make
it clear that this construction does actually correspond to the usual free product construction.
The name “free product” is also made a bit clearer, as we are essentially building the free
group on the disjoint union of the underlying sets of the inputs, and then quotienting that to
get the result. While there are some categorical treatments of normalization, the normalization
arguments used above were not guided by the category theory. The (underlying sets of the) models
produced by the above |F| and |R| functors are big equivalence classes of “terms”. The above
constructions provide no guidance for finding “good” representatives of those equivalence classes.</p>
<h2 id="conclusions">Conclusions</h2>
<p>This was, of course, a very complex and round-about way of answering the title question.
Obviously the real goal was illustrating these ideas and illustrating how “abstract” categorical
reasoning can lead to relatively “concrete” results. Of course, these concrete constructions
are derived from other concrete constructions, usually concrete constructions of limits and
colimits in |\mathbf{Set}|. That said, category theory allows you to get a lot from a small
collection of relatively simple concrete constructions. Essentially, category theory is like
a programming language with a small set of primitives. You can write “abstract” programs in
terms of that language, but once you provide an “implementation” for those primitives, all
those “abstract” programs can be made concrete.</p>
<p>I picked (finite) coproducts, in particular, as they are where a bunch of complexity suddenly
arises when studying algebraic objects categorically, but (finite) coproducts are still
fairly simple.</p>
<p>For Lawvere theories, one thing to note is that the Lawvere theory is independent of the
presentation. Any presentation of the axioms of a group would give rise to the same Lawvere
theory. Of course, to explicitly describe the category would end up requiring a presentation
of the category anyway. Beyond Lawvere theories are algebraic theories and algebraic categories,
and further into essentially algebraic theories and categories. These extend to the multi-sorted
case and then into the finite limit preserving case. The theory of categories, for example,
cannot be presented as a Lawvere theory but is an essentially algebraic theory. There’s much
more that can be said even about specifically Lawvere theories, both from a theoretical
perspective, starting with monadicity, and from practical perspectives like algebraic effects.</p>
<p>Familiarity with the properties of functor categories, and especially categories of (co)presheaves
was behind many of these results, and many that I only mentioned in passing. It is always
useful to learn more about categories of presheaves. That said, most of the theory works
in an enriched context and often without too many assumptions. The fact that all we need
to talk about models is for the codomains of the functors to have finite products allows
quite broad application. We can talk about algebraic objects almost anywhere. For example,
sheaves of rings, groups, etc. can equivalently be described as models of the theories of
rings, groups, etc. in sheaves of sets.</p>
<p>Kan extensions unsurprisingly played a large role, as they almost always do when you’re
talking about (co)presheaves. One of the motivations for me to make this article was
a happy confluence of things I was reading leading to a nice, coend calculus way of
describing and proving finite-product-preservation for free models.</p>
<p>Thinking about what <em>exactly</em> was going on around finite-product-preservation was fairly interesting.
The incorrect definition of the free model functor could be corrected in a different
(though, of course, ultimately equivalent) way. The key is to remember that the coend
formula for the left Kan extension generally involves a <em>copower</em> and not a cartesian product.
The copower for |\mathbf{Set}|-valued functors is different from the copower for
finite-product-preserving |\mathbf{Set}|-valued functors. For a category with (arbitrary)
coproducts, the copower corresponds to the coproduct of a constant family. We get,
|F(S) \cong \coprod_{S} \mathcal T(\mathsf{G}^1, {-})| as is immediately
evident from |F| being a left adjoint and a set |S| being the coproduct of |1| |S|-many times.
For the purposes of this article, this would have been less than satisfying as figuring
out what coproducts were was the nominal point.</p>
<p>That said, it isn’t completely unsatisfying
as this defines the free model in terms of a coproduct of, specifically, representables
and those are more tractable. In particular, an easy and neat exercise is to work
out what |\mathcal{T}(\mathsf{G}^n, {-}) + \mathcal{T}(\mathsf{G}^m, {-})| is.
Just use Yoneda and work out what must be true of the mapping out property, and remember
that the object you’re mapping into preserves finite products. Once you have finite
coproducts described, you can get all the rest via filtered colimits.</p>]]></description>
    <pubDate>2023-12-21 18:47:57-08:00</pubDate>
    <guid>https://www.hedonisticlearning.com/posts/what-is-the-coproduct-of-two-groups.html</guid>
    <dc:creator>Derek Elkins</dc:creator>
</item>
<item>
    <title>Preserving, Reflecting, and Creating Limits</title>
    <link>https://www.hedonisticlearning.com/posts/preserves-reflects-creates.html</link>
    <description><![CDATA[<h2 id="introduction">Introduction</h2>
<p>This is a brief article about the notions of preserving, reflecting, and creating limits and,
by duality, colimits. Preservation is relatively intuitive, but the distinction between
reflection and creation is subtle.</p>
<h2 id="preservation-of-limits">Preservation of Limits</h2>
<p>A functor, |F|, <strong>preserves limits</strong> when it takes limiting cones to limiting cones. As
often happens in category theory texts, the notation focuses on the objects. You’ll often
see things like |F(X \times Y) \cong FX \times FY|, but implied is that one direction of
this isomorphism is the canonical morphism |\langle F\pi_1, F\pi_2\rangle|. To put it yet
another way, in this example we require |F(X \times Y)| to satisfy the universal property
of a product with the projections |F\pi_1| and |F\pi_2|.</p>
<p>Other than that subtlety, preservation is fairly intuitive.</p>
<h2 id="reflection-of-limits-versus-creation-of-limits">Reflection of Limits versus Creation of Limits</h2>
<p>A functor, |F|, <strong>reflects limits</strong> when whenever the image of a <em>cone</em> is a limiting cone,
then the original cone was a limiting cone. For products this would mean that if we
had a wedge |A \stackrel{p}{\leftarrow} Z \stackrel{q}{\to} B|, and |FZ| was the product
of |FA| and |FB| with projections |Fp| and |Fq|, then |Z| was the product of |A| and |B|
with projections |p| and |q|.</p>
<p>A functor, |F|, <strong>creates limits</strong> when whenever the image of a <em><strong>diagram</strong> has a limit</em>,
then the diagram itself has a limit and |F| preserves the limiting cones. For products
this would mean if |FX| and |FY| had a product, |FX \times FY|, then |X| and |Y| have
a product and |F(X \times Y) \cong FX \times FY| via the canonical morphism.</p>
<p>Creation of limits implies reflection of limits since we can just ignore the apex of the
cone. While creation is more powerful, often reflection is enough in practice as we usually
have a candidate limit, i.e. a cone. Again, this is often not made too explicit.</p>
<h3 id="example">Example</h3>
<p>Consider the posets:</p>
<p><span class="math display">$$\xymatrix{
     &amp;          &amp;                   &amp; c \\
X\ar@{}[r]|{\Large{=}} &amp; a \ar[r] &amp; b \ar[ur] \ar[dr] &amp;   \\
     &amp;          &amp;                   &amp; d 
\save "1,2"."3,4"*+[F]\frm{}
\restore
} \qquad \xymatrix{
     &amp;                   &amp; c \\
Y\ar@{}[r]|{\Large{=}} &amp; b \ar[ur] \ar[dr] &amp;   \\
     &amp;                   &amp; d 
\save "1,2"."3,3"*+[F]\frm{}
\restore
} \qquad \xymatrix{
     &amp; c \\ 
Z\ar@{}[r]|{\Large{=}} &amp;   \\ 
     &amp; d
\save "1,2"."3,2"*+[F]\frm{}
\restore
}$$</span></p>
<h4 id="failure-of-reflection">Failure of reflection</h4>
<p>Let |X=\{a, b, c, d\}| with |a \leq b \leq c| and |b \leq d| mapping to |Y=\{b, c, d\}|
where |a \mapsto b|. Reflection fails because |a| maps to a meet but is not itself a meet.</p>
<h4 id="failure-of-creation">Failure of creation</h4>
<p>If we change the source to just |Z=\{c, d\}|, then creation fails because |c| and |d| have a meet
in the image but not in the source. Reflection succeeds, though, because there are no
non-trivial cones in the source, so every cone (trivially) gets mapped to a limit cone.
It’s just that we don’t have any cones with both |c| and |d| in them.</p>
<p>In general, recasting reflection and creation of limits for posets gives us: Let |F: X \to Y| be
a monotonic function. |F| reflects limits if every lower bound that |F| maps to a meet is
already a meet. |F| creates limits if whenever |F[U]| has a meet for |U \subseteq X|, then |U|
already had a meet and |F| sends the meet of |U| to the meet of |F[U]|.</p>]]></description>
    <pubDate>2023-03-20 22:39:28-07:00</pubDate>
    <guid>https://www.hedonisticlearning.com/posts/preserves-reflects-creates.html</guid>
    <dc:creator>Derek Elkins</dc:creator>
</item>

    </channel>
</rss>
