When sets are first introduced to students, usually examples are used with finite, explicitly presented sets. For
example, |\{1,2,3\} \cup \{2,3,4\} = \{1,2,3,4\}|. This is the beginning of the idea that a set is a “collection” of things.
Later, when infinite sets are introduced, the idea that sets are “collections” of things is still commonly used as
the intuitive basis for the definitions. While I personally find this a bit of a philosophical bait-and-switch, my
main issue with it is that I don’t think it does a good job reflecting how we work with sets day-to-day nor for
more in-depth set-theoretic investigations. Instead, I recommend thinking about infinite sets as defined by properties
and |x \in X| for some infinite set |X| means checking whether it satisfies the property defining |X|, *not* rummaging
through the infinite elements that make up |X| and seeing if |x| is amongst them. This perspective closely reflects
how we prove things about infinite sets. It makes it much clearer that the job is to find logical relationships between
the properties that define sets.

Of course, this view can also be applied to finite sets and *should* be applied to them.
For a constructivist, the notion of “finite set” splits into multiple inequivalent notions, and it is quite easy to show
that there is a subset of |\{0,1\}| which is not “finite” with respect to strong notions of “finite” that are commonly used
by constructivists. Today, though, I’ll stick with classical logic and classical set theory. In particular, I’m going to
talk about the Internal Set Theory of Edward Nelson, or mostly the
small fragment he used in Radically Elementary Probability Theory.
In the first chapter of an unfinished book on Internal Set Theory, he
states the following:

Perhaps it is fair to say that “finite” does not mean what we have always thought it to mean. What have we always thought it to mean? I used to think that I knew what I had always thought it to mean, but I no longer think so.

While it may be a bit strong to say that Internal Set Theory leads to some question about what “finite” means, I think it makes a good case for questioning what “finite set” means. These concerns are similar to the relativity of the notion of “(un)countable set”.

Minimal Internal Set Theory (**minIST**) starts by taking all the axiom( schema)s of **ZFC**
as axioms. So right from the get-go we can do “all of math” in exactly the same way we used to. **minIST** is a conservative extension
of **ZFC**, so a formula of **minIST** can be translated to a formula of **ZFC** that is provable if and only if the original formula
was provable in **minIST**.

Before going into what makes **minIST** different from **ZFC**, let’s talk about what “finite set” means in **ZFC** (and thus **minIST**).
There are multiple equivalent characterizations of a set being “finite”. One common definition
is a set that has no bijection with a proper subset of itself. A more positive and more useful equivalent definition is a set is **finite** if
and only if it has a bijection with |\{k \in \mathbb N \mid k < n\}| for some |n \in \mathbb N|. This latter definition will be handier for our purposes here.

The only new primitive concept **minIST** adds is the concept of a “standard” natural number. That is, a new predicate symbol is added to the
language of **ZFC**, |\mathsf{St}|, and |n \in \mathbb N| is **standard** if and only if |\mathsf{St}(n)| holds. For convenience,
we’ll define the following shorthand: |\forall^{\mathsf{St}}n.P(n)| for |\forall n.\mathsf{St}(n) \Rightarrow P(n)|. This is very
similar to the shorthand: |\forall x \in X.P(x)| which stands for |\forall x.x \in X \Rightarrow P(x)|. In fact, if we had a set of standard natural numbers
then we could use this more common shorthand, but it will turn out that the existence of a set of standard natural numbers is contradictory.

The four additional axiom( schema)s of **minIST** are the following:

- |\mathsf{St}(0)|, i.e. |0| is standard.
- |\forall n.\mathsf{St}(n) \Rightarrow \mathsf{St}(n+1)|, i.e. if |n| is standard, then |n+1| is standard.
- |P(0) \wedge (\forall^{\mathsf{St}}n.P(n) \Rightarrow P(n+1)) \Rightarrow \forall^{\mathsf{St}} n.P(n)| for every formula |P|, i.e. we can do induction over standard natural numbers to prove things about all standard natural numbers.
- |\exists n \in \mathbb N.\lnot \mathsf{St}(n)|, i.e. there exists a natural number that is not standard.

That’s it. There is a bit of a subtlety though. The Axiom Schema of Separation (aka the Axiom Schema of Specification or Comprehension)
is an axiom schema of **ZFC**. It is the axiom schema that justifies the notion of set comprehension, i.e. defining a subset of |X| via |\{x \in X \mid P(x)\}|.
This axiom schema is indexed by the formulas *of ZFC*. When we incorporated

Suppose there is some set |S| such that |n \in S \iff \mathsf{St}(n)|. Using axioms 1 and 2 we can prove that |0 \in S| and
if |n \in S| then |n+1 \in S|. The *internal* form of induction, like all internal theorems, still holds. Internal induction is just
axiom 3 but with |\forall| instead of |\forall^{\mathsf{St}}|. Using internal induction, we can immediately prove that |\forall n \in \mathbb N.n \in S|,
but this means |\forall n \in \mathbb N.\mathsf{St}(n)| which directly contradicts axiom 4, thus there is no such |S|. |\square|

This negative result is actually quite productive. For example, we can’t have a set |N| of nonstandard natural numbers otherwise we could
define |S| as |\mathbb N \setminus N|. This means that if we can prove some *internal* property holds for all nonstandard naturals, then there must
exist a standard natural number for which it holds as well. Properties like this are called **overspill** because an internal property that holds
for the standard natural numbers spills over into the nonstandard natural numbers. Here’s a small example. First, a real number |x| is **infinitesimal**
if and only if there exists a nonstandard natural |\nu| such that |\vert x\vert \leq 1/\nu|. (Note, we can prove that every nonstandard natural is larger than every
standard natural.) Two real numbers |x| and |y| are **nearly equal**, written |x \simeq y|, if and only if |x - y| is infinitesimal. A
function |f : \mathbb R \to \mathbb R| is **(nearly) continuous at |x|** if and only if for all |y \in \mathbb R|, if |x \simeq y| then |f(x) \simeq f(y)|. Now
if |f| is continuous at |x|, then for every standard natural |n| and |y \in \mathbb R|, |\vert x - y\vert \leq \delta \Rightarrow \vert f(x) - f(y)\vert \leq 1/n| holds
for all infinitesimal |\delta| by continuity. Thus by overspill it holds for some non-infinitesimal |\delta|, i.e. for some |\delta > 1/m| for some
standard |m|. |\square|

In *Radically Elementary Probability Theory*, Nelson reformulates much of probability theory and stochastic process theory by restricting
to the finite case. Traditionally, probability theory for finite sample spaces is (relatively) trivial. But remember, “finite” means
bijective with |\{k \in \mathbb N \mid k < n\}| for some |n \in \mathbb N| and that |n| can be nonstandard in **minIST**. The notion of “continuous”
defined before translates to the usual notion when the **minIST** formula is translated to **ZFC**. Similarly, subsets of infinitesimal
probability become subsets of measure zero. Sums of infinitesimals over finite sets of nonstandard cardinality become integrals. Little
to nothing is lost, instead it is just presented in a different language, a language that often lets you say complex things simply.

Again, I want to reiterate that *everything true in* **ZFC** *is true in* **minIST**. One of the issues with Robinson-style nonstandard
analysis is that the hyperreals are not an Archimedean field. An ordered field |F| is Archimedean if for all |x \in F| such that |x > 0|
there is an |n \in \mathbb N| such that |nx > 1|. The hyperreals serve as a model of the (**min**)**IST** reals. *Within* **minIST**, the
Archimedean property holds. Even if |x \in \mathbb R| is infinitesimal, there is still a nonstandard |n \in \mathbb N| such that |nx > 1|. But this
statement translates to the statement in the meta-language that there is a hypernatural |n| such that for every positive
hyperreal |x|, |nx > 1| which does hold. However, if we use the *meta-language’s* notion of “natural number”, this statement is false.

Full Internal Set Theory (**IST**) only has three axiom schemas in addition to the axioms of **ZFC**. Like **minIST**, there is |\mathsf{St}|
but now we talk about standard sets in general, not just standard natural numbers. The three axiom schemas are as follows:

- |\forall^{\mathsf{St}}t_1 \cdots \forall^{\mathsf{St}}t_n.(\forall^{\mathsf{St}} x.P(x, t_1, …, t_n)) \iff \forall x.P(x, t_1, …, t_n)| for internal |P| with no (other) free variables.
- |(\forall^{\mathsf{StFin}}X.\exists y.\forall x \in X.P(x, y)) \iff \exists y.\forall^{\mathsf{St}} x.P(x, y)| for internal |P| which may have other free variables.
- |\forall^{\mathsf{St}}X.\exists^{\mathsf{St}} Y.\forall^{\mathsf{St}}z.(z \in Y \iff z \in X \wedge P(z))| for
*any*formula |P| which may have other free variables.

These are called the Transfer Principle (T), the Idealization Principle (I), and the Standardization Principle (S) respectively. |\forall^{\mathsf{StFin}}X.P(X)| means for all |X| which are both standard and finite. The Transfer Principle essentially states that an internal property is true for all sets if and only if it is true for all standard sets. The Transfer Principle allows us to prove that any internal property that is satisfied uniquely is satisfied by a standard set. This means everything we can define in ZFC is standard, e.g. the set of reals, the set of naturals, the set of functions from reals to naturals, any particular natural, |\pi|, the Riemann sphere.

The Idealization Principle is what allows nonstandard sets to exist. For example, let |P(x, y) \equiv y \in \mathbb N \wedge (x \in \mathbb N \Rightarrow x < y)|. The Idealization
Principle then states that if for any finite set of natural numbers, we can find a natural number greater than all of them, then there is
a natural number greater than any standard natural number. Since we clearly can find a natural number greater than any natural number in some
given finite set of natural numbers, the premise holds and thus we have a natural number greater than any standard one and thus necessarily
nonstandard. In fact, a similar argument can be used to show that *all* infinite sets have nonstandard elements. A related argument shows
that a standard, finite set is exactly a set all of whose elements are standard.

The Standardization Principle isn’t needed to derive **minIST** nor is it needed for the following result so I won’t discuss
it further. Another result derivable from Idealization is: there is a finite set that contains every standard set. To prove it, simply
instantiate the Idealization Principle with |P(x,y) \equiv x \in y \wedge y\ \mathsf{is\ finite}| where “is finite” stands for the formalization
of any of the definitions of “finite” given before. It is in the discussion after proving this that Nelson makes the statement
quoted in the introduction. We can prove some results about this set. First, it can’t be standard itself. Here are three different reasons why:
1) if it were standard, then it would include itself violating the Axiom of Foundation/Regularity; 2) if it were standard, then by Transfer
we could prove that it contains *all* sets again violating the Axiom of Foundation; 3) if it were standard, then by the result mentioned in
the previous paragraph, it would contain *only* standard elements, but then we could intersect it with |\mathbb N| to get the set of standard
naturals which does not exist. Like the fact that there is no smallest nonstandard natural, there is no smallest finite set containing all
standard sets.

**IST** also illustrates another concept. In set theory we often talk about classes. My experience with this concept was a series of poor
explanations: a class was “the extent of a predicate”, “the ‘collection’ of entities that satisfy a predicate” and a proper class was a
“collection” that was “too big” to be a set like the class of all sets. I muddled on with vague descriptions like this for quite some time before
I finally figured out what they meant. Harking back to my first paragraph, a **class** (in **ZFC**^{1}) is just a (unary) predicate, i.e. a formula (of **ZFC**) with one free variable, or at least it can be represented by such^{2}. A
formula |P(x)| is a **proper class** if there is no set |X| such that |\forall x.P(x) \iff x \in X|. The “class of all sets” is simply the constantly
true predicate. This is a proper class because a set |U| such that |\forall x.x \in U| leads to a contradiction, e.g. Russell’s paradox by
warranting |\{x \in U \mid x \notin x\}| via the Axiom of Separation. It may have already dawned on you that |\mathsf{St}(x)| is a class in
(**min**)**IST** and, in fact, a *proper* class. All of those times when I said “there is no set of standard naturals/nonstandard
naturals/infinitesimals/standard sets”, I could equally well have said “there is a proper class of standard naturals/nonstandard
naturals/infinitesimals/standard sets”. The result of the previous paragraph means the (proper) class of standard sets – far from being
“too big” to be a set – is a subclass of (the class induced by) a finite set! Classes were never about “size”^{3}.

To reiterate, **IST** is a *conservative extension* of **ZFC**. Focusing first on the “conservative” part, when we translate the theorem that
there is a finite set containing all standard sets to **ZFC**, it becomes the tautologous statement that every finite set is a subset of some
finite set. This starts to reveal what is happening in **IST**. However, focusing on the “extension” part, *nothing in* **ZFC** *refutes any
of* **IST**. To take a Platonist perspective, these nonstandard sets could “always have been there” and **IST** just finally lets us talk about
them. If you were a Platonist but didn’t want to accept these nonstandard sets, the conservativity result would still allow you to use **IST**
as “just a shorthand” for formulas in **ZFC**. Either way, it is clear that the notion of “finite set” in **ZFC** has a lot of wiggle room.

Some set theories have an explicit notion of “class”.↩︎

The point is that any logically equivalent formula would do, i.e. classes are equivalence classes of formulas under logical equivalence. One could arguably go further and say that the two formulas represent the same class even if we can’t

*prove*that they are logically equivalent as long as we (somehow) know they are “true” for exactly the same things. This extensional view of predicates is what the “extent” stuff is about.↩︎On the other hand, the Axiom Schema of Specification states that any subclass of a set is a set in ZFC, so for ZFC no proper class can be contained in a set. In fact, many non-traditional systems can be understood as only allowing certain subclasses of “sets” to be “sets”. For example, in constructive recursive mathematics, we may require a “set” to be recursively enumerable, but we can certainly have subclasses of recursively enumerable sets that are not recursively enumerable. See also this discussion on the Axiom of Choice and “size”.↩︎