Skolem's paradox up close and personal

Vaughan Pratt

Thoralf Skolem is my great-great-grandadvisor, having advised Øystein Ore who advised Marshall Hall who advised my advisor Don Knuth. Skolem turned 120 this year, on May 23, and I thought it appropriate to reflect on some contribution of his. Having taught a freshman seminar titled "Paradox: Bug or Feature?" at Stanford in 1997, I chose Skolem's Paradox, that set theory has countable models even though the power set of the natural numbers is uncountable!

Cantor showed in effect that for every set X, the power set ℘X consisting of all subsets of X is a strictly bigger set. It is easy to exhibit an injection (one-to-one function) from X to ℘X: just map each x in X to the singleton {x}. What Cantor showed is that there cannot exist an injection f: ℘X → X in the other direction: every f of this form must send two subsets of X to the same element of X.

One version of Cantor's proof goes as follows. Suppose to the contrary that f: ℘X → X is an injection. Let Z = {f(Y)| f(Y) ∉ Y}. Then f(Z) ∉ Z if and only if f(Z) ∈ Z, a contradiction, whence no such injection can exist. (Injectivity of f is needed for the "if" direction: without injectivity we could have f(Z) ∈ Z on account of some Y for which f(Y) = f(Z) but f(Y) ∉ Y.)

A set is called countable when it can be put in bijection with a set of natural numbers. In particular the set ℕ of all natural numbers is itself countable, being in bijection with itself via the identity function. Cantor's theorem shows that the set ℘ℕ of all sets of natural numbers cannot be put in bijection with any set of natural numbers or we would have an injection from ℘ℕ to ℕ. So ℘ℕ is uncountable.

Cantor's proof can be carried out in Zermelo's formalization of set theory. This formalization is expressed using variables, quantifiers, logical connectives (AND, OR, NOT), and parentheses, all of which are assumed to have their standard meanings, along with a binary relation symbol ∈ for set membership. This last symbol is the only one whose interpretation is constrained by Zermelo's axioms

A model of a theory such as Zermelo's consists of a set of elements and an interpretation of each nonlogical symbol in the language of the theory. Since ∈ is the only nonlogical symbol in Zermelo's axiomatization, it follows that a model of Zermelo's axioms is a set U and a binary relation E on U, that is, a subset of U×U. U is the universe of sets in the model. E constitutes the interpretation of the symbol ∈ in this model. We can picture E as a square (U×U) matrix of 0's and 1's. For any two elements X,Y of U, there is a 1 in row X and column Y of the matrix E just when X is a member of Y.

System Z, Zermelo's axioms, imposes the following constraints on E.

1. Extensionality. If two sets have the same elements then they are the same set. In terms of the matrix E this says that all columns are distinct.

2. Pairing. For all X,Y there exists {X,Y}. This says that E must contain every possible column having one or two 1's (the case of a single 1 arises when X = Y).

3. Comprehension. This says that for every set X and every first order predicate expressible in the language of Zermelo's axioms there exists a set consisting of those elements of X satisfying that predicate. That is, given a column (not necessarily in E) definable by some predicate on U, its intersection with any column of E must be a column of E. In particular the false predicate defines the column of all zeros, the intersection of which with any column is zero. This corresponds to the empty set, which therefore belongs to U.

4. Union. For any set X there exists a set ∪X consisting of the members of the members of X. That is, E must include all the columns of the relation E², the composition of E with itself. It then follows from pairing that the columns of E are closed under binary disjunction: X∪Y = ∪{X,Y}. It further follows that every possible column with a finite number of 1's must be present in E, with comprehension supplying the empty set.

5. Power set. For all X there exists ℘X such that for all Y in U, Y ⊆ X if and only if Y ∈ ℘X. In E, ℘X is that column with a 1 in row Y, for any Y in U, just when column X has a 1 in every row that column Y does.

6. Foundation. Any nonempty set X must contain an element Y containing no elements of X. It follows that no set can be a member of itself. For if Y is such a set then the set X = {Y} is a nonempty set every element of which contains an element of {Y}, since {Y} has just the one element Y which contains Y as a member. In terms of E, the diagonal of E, those entries at row X and column X for all X in U, must be zero. It follows that there cannot exist a column of all ones, that is, U cannot be a set since that would make it a member of itself.

7. Infinity. There is a set ℕ containing the empty set 0 and closed under successor s(X) = X ∪ {X}. By foundation the sets 0,s(0),s(s(0)),… must all be distinct. Hence ℕ must include all these sets, and therefore must be infinite. In terms of the matrix E, column ℕ has infinitely many ones.

These seven axioms constitute Zermelo's axiomatization of set theory, which is therefore denoted Z. There is an eighth axiom called Replacement due to Fraenkel, defining ZF for Zermelo-Fraenkel. Adding a further axiom called Choice then defines ZFC, Zermelo-Fraenkel with Choice. However neither of these axioms change the nature of Skolem's paradox and we can therefore ignore them and stick to axiom system Z.

Now these axioms are all expressed in first order logic. While it is an open question whether ZF has a model, it is straightforward to produce a model of Z. What can such a model look like?

The Löwenheim-Skolem theorem concerns models of first-order or elementary theories. In modern language it says that any structure with a countable language has a countable elementary substructure. In this context "elementary" means that the substructure satisfies every first-order sentence (in the specified language) that holds of the structure.

Leopold Löwenheim first proved the theorem in 1915. In 1920 Thoralf Skolem greatly simplified Löwenheim's proof by using what we now call Skolem functions. In a sentence such as ∀x∃y&phi(x,y), noting that the choice of y depends on x, we can make the choice concrete by rephrasing the sentence as ∀x&phi(x,f(x)) where f is the Skolem function that chooses y appropriately given x. Given any consistent first-order theory expressed in a countable language, Skolem showed how to construct a countable model of that theory from the countably many terms expressible in the language augmented with these Skolem functions.

Since Z has a model, and its language is not only countable but finite, having only the one nonlogical symbol ∈, it follows from the Löwenheim-Skolem theorem that Z has a countable model.

Skolem's paradox asks, how can a theory such as Z that proves the existence of an uncountable set have a countable model?

Although the axioms of set theory pin down exactly much of the behavior of sets, there is one intuitively obvious property that is missing, namely that of closure under taking subsets. In any model (U,E) of Z it is natural to expect that any column obtainable from a column of E by changing some of its ones to zeros would itself be a column of E. However there is no way of saying this. Furthermore if there were it would contradict the Lowenheim-Skolem theorem, since if E contained a column with infinitely many ones it would then have to have uncountably many columns, making the model uncountable.

It is these missing columns that allow Skolem's paradox. In the external view of a countable model of Z, both ℕ and ℘ℕ must be countable. Hence there does exist an injection f: ℘ℕ → ℕ, visible to us external observers. We can then use that injection as in Cantor's proof to construct the set Z = {f(Y)| f(Y) ∉ Y} as a subset of ℕ. The proper conclusion at this point is not the contradiction Cantor's argument depends on, but instead that Z is one of the missing columns.

Does this undermine Cantor's argument? Well, this is a very nice question for philosophers, who might want to argue against first order logic on this ground. Why should we trust theorems when for all we know they are referring to a defective model, or at least a different model from the one we had in mind?

Here are two competing resolutions of Skolem's paradox, both of which preserve the soundness of Cantor's argument. The first is a commonly suggested syntactic solution, the second a somewhat less commonly suggested semantic one, the essential feature of which is that the number of sets in the model can vary but the sets themselves cannot vary with respect to either their members or their subsets.

The Logical Scenario One resolution of Skolem's paradox starts from the observation that Skolem's observation only becomes paradoxical when we mix concepts from different models. Our claim above that ℘ℕ and ℕ are countable in a countable model of Z is made in another model of set theory, one in which the universe U of the countable model is assumed to be a set in order to talk about it as a model. Since U is provably not a set in the countable model, already we have an inconsistency between these two models of set theory, so it is not surprising to find other inconsistencies such as that implied by Skolem's paradox.

In particular the injection we found from ℘ℕ to ℕ was an object of the external model. The proper interpretation of Cantor's theorem is that, in any model of set theory, there can be no such injection. And indeed when we examine all the sets of the countable model, none of them represent an injection from ℘ℕ to ℕ. Invoking a function from the external view of the model is like using "pain" to mean "hurt" in France.

In the logical or syntactic scenario, the only sets that come up for discussion are those namable in Z. These have certain properties that are provable in Z. In particular each of those sets arising somehow as an infinite power set is uncountable. Those properties do not change from model to model. What can change between models is the number of elements of a particular named set from the point of view of an observer of the model, who may find uncountably many such elements in one interpretation of that name and countably many in another. No significance is attached to these external interpretations however.

The Platonic Scenario A competing resolution is to regard these countable models of Z as clearly nonstandard and not what we had mind by the axioms of set theory. Rather we were thinking of a model that is closed under taking subsets. That is, any column obtainable from a column of E by replacing some of its ones by zeros must itself be a column of E. Call such a model standard.

Unfortunately there is nothing we can say in the language of Z that forces this degree of standardness. If we say "for all Y ⊆ X" the Y's that we quantify over are merely those already in the model: pointing at them cannot make the missing ones appear. If we try to manufacture subsets of X using the axiom of comprehension to delete elements from X, there are only countably many subsets we can produce in that way, since Z has only countably many predicates.

What we can do however is step gingerly outside the bounds of first order logic and insist on interpreting Z only with models that are standard in this sense of being closed under taking subsets. The internal and external notions of ℘X then coincide: every subset of X visible to an external observer arises as an element of U. Hence if column ℘X has infinitely many ones it has uncountably many ones. The fact that there exist nonstandard models of Z in which all sets are (externally) countable is then just a curiosity of our first order axiomatization of set theory.

This requirement of standardness makes the internal sets behave more like the external sets without however making them exactly the same. In particular the universe U of any model of Z, standard or not, is a set in the external view of the model but not an internal set of the model (the axiom of foundation prevents any column from containing all ones, by forcing the diagonal of E to be zero). We can account for this by regarding our external view of any model of Z as being defined in a larger model of Z, one that has additional larger sets including the set of all elements of the model. We can assume that both universes are standard.

This notion of standard model provides a nice middle ground which bars the most egregiously small models of set theory while on the other hand not requiring them to be so large as to run any risk of inconsistency.

But wait. What if the external view of set theory by which we judge the standardness of models of Z is itself not standard? Could there be a yet more external view that reveals previously unrecognized subsets of some set X? Somehow we'd like our notion of standardness to rule this out: we want every subset of X that could possibly exist.

So our notion of standardness should come with some guarantee of maximality. No external view of a standard model should reveal subsets missing from the supposedly standard model.

And so on. One can hold forth at much greater length about Skolem's paradox. Timothy Bays's Ph.D. thesis Reflections on Skolem's Paradox in particular brings more than a hundred pages of high-powered model theory to bear on the topic, at a much higher level than we have engaged here. This is tightened up to 32 pages in its published version The Mathematics of Skolem's Paradox.

Also on the web is an extract from A.W. Moore's article Set Theory, Skolem's Paradox and the Tractatus casting the issue in terms of relative vs. absolute uncountability and the ill-definedness of the scope of the concept of set. It seems to me that Moore, like Bays, turns Skolem's paradox into too complicated a story. Both my scenarios, the logical and the Platonic, show how one can have one's cake and eat it too, by taking countability to be an absolute concept without however committing oneself to a fixed scope for the concept of set.

To vary the scope in the logical scenario, allow any model of Z but always interpret uncountability internally. Since the properties of namable sets are established purely logically, independently of the choice of model, varying the model cannot change those properties.

To vary the scope in the Platonic scenario, by all means bring in new sets, but in doing so don't add any new elements or subsets to already established sets. In this way varying the scope of set theory cannot vary the countability or otherwise of any one set.

The point of the present essay was not so much to suggest one option over another as to expose the underlying logic of Skolem's paradox by showing in as simple a way as possible how Z can entail the existence of infinite sets while still having countable models, and to point out two simple resolutions of the paradox, the logical scenario and the Platonic scenario. As to what else can or should be done about the paradox, let a hundred theses bloom.

Vaughan Pratt