Slides for my AMS talk in Denver on October 9, 2016.
[If any of the symbols × → ≤ ∧ ∨ ∀ ∞ ∈ ∉ is a box then your browser is lacking some of the HTML 4.0 mathematical symbols as listed in http://www.cs.tut.fi/~jkorpela/html/guide/entities.html. Consider upgrading your browser to Firefox. Safer too. If only the last symbol is box, it denotes not-∈.]
Short Form A Chu space is a transformable matrix whose rows transform forwards while its columns transform backwards.
Generality of Chu spaces Chu spaces unify a wide range of mathematical structures, including the following.
Algebraic structures can be reduced to relational structures by a technique described below. Relational structures constitute a large class in their own right. However when adding topology to relational structures, the topology cannot be incorporated into the relational structure but must continue to use open sets.
Chu spaces offer a uniform way of representing relational and topological structure simultaneously. This is because Chu spaces can represent relational structures via a generalization of topological spaces which allows them to represent topological structure at the same time using the same machinery.
Definition
Surprisingly this degree of generality can be achieved with a remarkably simple form of structure. A Chu space (X,r,A) consists of just three things: a set X of points, individuals, or subjects, a set A of states or predicates, and a lookup table or matrix r: X×A → K which specifies for every subject x and predicate a the value a(x) of that predicate for that subject. The value of a(x) is given by the matrix r as the entry r(x,a). These matrix entries are drawn from a set K.
K can be as simple as {0,1}, as when representing topological spaces or Boolean algebras. Or it can be as complex as the set of all complex numbers, as when representing Hilbert spaces. Or it can be something in between, such as the set of 16 subsets of {0,1,2,3} when representing topological groups. The full generality of Chu spaces derives from the ability of predicates to take other values than simply true or false (the case K={0,1}).
This definition can be reconciled with the short form definition above by taking A to be a subset of KX, namely by representing the predicate a as the function λx.r(x,a): X→K mapping each individual to the value of the predicate on that individual, and taking A to be the set all such functions from X to K. In this case there is no separate matrix, just the two sets consisting of respectively individuals and predicates. However this breaks the symmetry of subject and predicate. The more symmetric matrix encoding of this information makes it equally reasonable to view X as a subset of KA when this is helpful, bearing in mind however that the matrix may contain repeated rows and/or columns, i.e. in neither case need these be extensional subsets of KX or KA.
A more satisfactory reconciliation with the short form definition is to view the subject-predicate relationship as a symmetrically expressed relation r. We do not have to interpret the notion of predicate on a set X of subjects as a function a:X→K, any more than we have to interpret a subject as a function x:A→K. Instead we have three options: either of those two, or just leaving r as the symmetric expression of the relationship. Unlike both algebras and topological spaces, Chu spaces do not take subjects to be primitive and predicates to be derived, but rather take both to be primitive.
In this symmetric view, ``subject'' is more natural than ``individual.'' One envisages an individual as having an independent existence. A subject on the other hand is a subject of something: it forms one half of an elementary proposition r(x,a) that combines a subject x with a predicate a.
This reconciliation has a historical link with the discovery and resolution of the paradoxes of Cartesian Dualism. If we identify the sets X and A with Descartes' 1647 division of the universe into physical and mental components respectively, then r is the mediator of these components sought by many philosophers during the following century. The respective proposals of Hume and Berkeley to make one side or the other primitive correspond to taking respectively X or A to be primitive and deriving the other in terms of functions from the former to K. That Hume won out would seem to be correlated with mathematics' preference for basing mathematical objects on their constituent individuals rather than their constituent predicates.
If we view the open sets of a topological space as its permitted predicates defining its structure, a topological space is an example of an object structured by the interaction of its subjects and predicates. It is by no means the only example however. Although both abelian groups and vector spaces are traditionally represented as algebras, they also have well-known alternative representations as what amounts to Chu spaces. Finite (and more generally locally compact) abelian groups are representable as the sets consisting of their elements and their homomorphisms (continuous in the infinite case, where locally compact becomes necessary) into the multiplicative group of nonzero complex numbers. And vector spaces V over a field F are representable as the sets of points and dual points of V, with the latter defined as the set of functionals on V, or linear transformations V→F where F in this context denotes the one-dimensional vector space over F.
Now K is always given merely as a set, with no particular structure of its own. In particular the case K={0,1} is not presumed to have the structure of a Boolean algebra or even a lattice, while when K is the set of complex numbers it is not assumed to be a field.
So a Chu space amounts to just a matrix r over a plain set K. As such it hardly seems like a general or powerful notion. The spark that animates Chu spaces is the manner in which they transform. Whereas a set X transforms into a set Y with a single function f: X→Y, a Chu space (X,r,A) transforms into a Chu space (Y,s,B) with a pair of functions f: X→Y, g:B→A. These functions are not entirely independent, being required to satisfy s(f(x),b) = r(x,g(b)) for all x in X and b in B, called the adjointness condition. We call f and g adjoints of each other.
In general there is no requirement that the rows and columns of the Chu space matrix be distinct. When they are however, each of the two functions f,g of a Chu transform determines the other. This can be seen by associating with f the matrix m(x,b) = s(f(x),b). By adjointness of f,g, m(x,b) = r(x,g(b)) as well. Thus each of f and g determines m.
Now if all the columns of (X,r,A) are distinct, that is, if (X,r,A) is extensional, there is only one possible function g:Y→X that can satisfy m(x,b) = r(x,g(b)) for all x and b. Hence under that assumption, m determines g whence so does f. Dually, if all the rows of (Y,s,B) are distinct, that is, if (Y,s,B) is separated, there is only one possible f: A→B that can satisfy m(x,b) = s(f(x),b). So under that assumption, m determines f whence so does g.
So actually f and g are far from independent. Even without any requirement of distinct rows or columns, if we know f then g is almost completely determined: the only degree of freedom it has is in choosing one column from a set of columns all having the same entries. We say that such columns are isomorphic. Two Chu spaces are equivalent when they have the same sets of rows and columns up to isomorphism, without regard for the identities of the subjects or predicates.
We interpret r(x,a)=1 as meaning x∈a (x is a member of a), and r(x,a)=0 as x∉a (x is not a member of a).
No additional restriction on the notion of Chu transform is needed, as it exactly expresses the standard notion of continuity for ordinary topological spaces. It achieves this as follows. The adjointness condition on (f,g) asserts x∈g(b) ⇔ f(x)∈b, that is, g(b) = {x|f(x)∈b}. But this is exactly the notion of inverse image of f. The requirement that g map B to A then asserts that the inverse image of each open set of (Y,s,B) is an open set of (X,r,A).
Although we required topological spaces to be extensional, we did not require them to be separated: they are permitted to have repeated rows. A topological space defined in this way is said to be T0 just when it is separated. The usual definition of a T0 space is one such that for any two distinct points there exists an open set containing one but not the other. But this can be seen to be the same thing as requiring all rows to be distinct.
Open sets as predicates. The Sierpinski space is the topological space having two points and one open set of each cardinality 0, 1, and 2. The point in the singleton open set is called 1 and the other point 0. The continuous functions from a topological space to the Sierpinski space are exactly the characteristic functions of the former's open sets; they can be viewed as the permitted predicates on the former space.
Chu spaces generalize topological spaces analogously to the way topological spaces generalize partially ordered and preordered sets, namely by imposing fewer restrictions. To appreciate the former it is helpful to understand the latter, which we explain here.
A preorder (X,≤) can be understand as a restricted kind of topological space, namely its Alexandrov topology, having as its open sets the order filters or up-sets of (X,≤). These are those sets a such that y≥x ∧ x∈a → y∈a. It can be verified that both the union and the intersection of any set of order filters is an order filter; in particular the empty set and the whole space are order filters, whence the Alexandrov topology so defined is indeed a topological space.
Conversely a topological space determines a preorder called the specialization order of the space. This is simply the coordinatewise ordering of the rows: x≤y is defined as ∀a[x∈a → y∈a]. This binary relation is clearly reflexive and transitive and hence a preorder. Furthermore it is antisymmetric, and hence a partial order, if and only if the space is T0. The important point here is that distinct (i.e. nonisomorphic) topological spaces can have the same (i.e. isomorphic) specialization orders, as we will illustrate below with N∪{∞}. Topology is more expressive than order in that it permits distinctions to be drawn that order alone cannot.
When the open sets are closed under arbitrary intersection (in addition to arbitrary union), the space is uniquely determined by its specialization order as being the Alexandrov topology of that order. Conversely any Alexandrov space, one whose open sets are closed under arbitrary intersection, is the Alexandrov topology on its specialization order.
All finite topological spaces are trivially Alexandrov spaces. The simplest example of a non-Alexandrov space is obtained by starting with the Alexandrov topology on the poset N∪{∞} standardly ordered as 0≤1≤2≤…≤∞, and removing the open set {∞}. When this open set is present, the predicate (as a continuous function to the Sierpinski space described above) that is false (0) on the natural numbers but true (1) on ∞ is permitted, as the inverse image of {1} is then {∞}. That is, {0} is a closed set whereas {1} is not. When {∞} is absent as an open set however, any function mapping all the natural numbers to 0 must also map ∞ to 0, i.e. to the same closed set {0} as where the natural numbers went. This topology is called the Scott topology on N∪{∞}.
This demonstrates the role of omitting closure of the open sets under infinite intersection, namely to permit expressing a quantity such as ∞ as the limit of an infinite sequence. The Sierpinski space is a primitive yet sufficient target for expressing this limit property; for any other target, if all finite numbers greater than some value land inside some closed set however small, then ∞ must also land inside that closed set. When the open sets are closed under arbitrary intersection, as with the Alexandrov topology of a poset, it ceases to be possible to link the fate of one point under a function to that of an infinite sequence of points.
Thus this additional flexibility of topological spaces over posets makes them more expressive. By allowing yet more possibilities, such as not being closed under arbitrary union for starters, Chu spaces are in turn more expressive than topological spaces.
As a simple example, the two point Chu space having two open sets, namely the two singletons, has for its continuous functions on itself just the two permutations of the two points. Unlike topological spaces, for which every constant function is continuous, here neither of the constant functions are continuous.
Sets are unstructured objects, like an unfurnished house. Structure has traditionally been provided in ad hoc ways according to need, like adding sofas and beds for a residence or desks and computers for an office. Mathematics furnishes its objects with operations to form algebras, relations to form relational structures, or topology to form topological spaces.
Chu spaces provide an elementary way of furnishing sets with structure that subsumes all the principal structuring techniques currently in use, both elementary and sophisticated. In so doing they remove the walls that divide up mathematics into its categories, and introduce many new structures previously unknown to mathematics, to create a new, universal, and homogeneous mathematical landscape. Every mathematical object is representable as a Chu space whose transformability is fully faithful to the transformability of the object it represents.
The extant mathematical categories tend in practice to group objects by stiffness. Sets are the most flexible, having the granularity of sand. Vector spaces are in the middle: picture a block of rubber. Boolean algebras are the stiffest.
Some categories have greater diversity of stiffness than others: partially ordered sets range in stiffness from that of sets to that of vector spaces, while the stiffness of distributive lattices ranges from that vector spaces to that of Boolean algebras.
Chu spaces span the whole gamut of stiffnesses; we have called this the Stone Gamut in honor of Marshall Stone, who discovered that the dual of a Boolean algebra was a Stone space, and that of a distributive lattice an ordered Stone space. (He of course did not call them that, but identified them nonetheless by precisely and fully describing them in the language of topology.)
Only a few standard categories of mathematical object share with Chu spaces this property of being self-dual. Among the better-known ones are finite abelian groups, finite-dimensional vector spaces, sets transformed by binary relations, finite chains with bottom (top also works), and (as a common generalization of the previous two) complete semilattices.
Chu spaces are important to the foundations of mathematics because they demonstrate that when one has stepped back to view the mathematical landscape from a sufficient distance, a global symmetry appears, duality, that is not apparent when standing inside any particular category. This is like the difference between natural numbers and integers: with the former there is no operation of negation, which only springs into existence to create a symmetry when the view is broadened to include the negative numbers.
Further enlarging the universe of numbers to bring in the rationals, the reals, or the complex numbers, does not harm the basic symmetry of negation. This feature of the universe of all numbers becomes apparent when you step back just far enough to see the integers. (Seeing the rationals then the reals entails then stepping closer again without losing the panorama of the integers.) By the same token, once the basic universe of Chu spaces has been created, further expansion to yet larger universes (at least when they are constituted from yet more general Chu spaces) does not destroy the symmetry of duality.
We are therefore dealing with the duality of points and states. Now it is eminently reasonable to think of points as physical and states as mental. This would make duality for Chu spaces a sort of duality between the mental and the physical.
When we introduced Chu spaces near the start of this page, we indicated their generality in being able to represent a wide range of mathematical objects. These representations are concrete in the sense that we represent algebras and relational structures having underlying set X as a Chu space (X,r,A) having the same underlying set, and having a set A of states purporting to represent the same structure as the algebra or relational structure.
Our test for whether the structure has been fully and faithfully represented will be whether the continuous functions between the Chu spaces doing the representing are the same functions as the homomorphisms between the structures they represent.
We can reduce the case of algebras to that of relational structures by the technique familiar from first order logic of representing an n-ary operation f: Xn→X as an (n+1)-ary relation R. This is accomplished by taking R to consist of all tuples of the form (x1,…xn,f(x1,…xn)), one such tuple for every n-tuple (x1,…,xn). For example the addition operation + of the algebra (N,+) of natural numbers under addition can be represented as the ternary relation R consisting of all triples (x,y,z) in N³ satisfying x+y=z.
We begin by showing how to represent a relational structure (X,R) having just one n-ary relation R, that is, R is a subset of Xn.
Let n denote the set {1,2,…,n}. We will represent the n-ary relational structure (X,R) as a Chu space over 2n. This is stated more precisely as follows.
Theorem. The category of n-ary relational structures and their homomorphisms is a concretely full subcategory of Chu(Set,2n).
``Concretely'' means that the representing Chu spaces have the same underlying sets as the structures they represent, and that the representing functions between them are the same functions as the homomorphisms they represent. (Hence the representation is faithful: distinct homomorphisms are represented by distinct continuous functions). ``Full'' means that every continuous function between Chu spaces representing structures represents a homomorphism between the represented structures.
Proof. Represent (X,R) as the Chu space (X,r,A) where A consists of those n-tuples of subsets (a_1,a_2,…,a_n), where each a_i ⊆ X, such that for every n-tuple (x_1,x_2,…,x_n) in R, there exists i such that x_i ∈ a_i. This makes A of type (2X)n, but this is isomorphic to (2n)X which is the desired type KX, noting that K=2n.
This representation is concrete, having the same carrier as (X,R). To see that the representation is full and faithful, we must show that any function f: X→Y is a homomorphism from (X,R) to (X,S) if and only if it is a continuous function from the Chu space (X,r,A) representing (X,R) to the Chu space (Y,s,B) representing (Y,S).
(Only if) We first show that homomorphisms are continuous. For a contradiction let f: X→Y be a homomorphism which is not continuous. Then there must exist a state (b1,…,bn) in B for which (f-1(b1),…,f-1(bn)) is not a state in A. Hence there exists (x1,…,xn)∈ R for which xi∉ f-1(yi) for every i. But then f(xi)∉ yi for every i, whence (f(x1),…,f(xn))∉ S, impossible because f is a homomorphism.
(If) We now show that continuous functions are homomorphisms. That is, given (x1,…,xn)∈ R we must have (f(x1),…,f(xn))∈ S. For if not then ({f(x1)}',…,{f(xn)}') is a state in B, where {f(xi)}' denotes all of Y except for the one element f(xi). Then by continuity, (f-1({f(x1)}'),…,f-1 ({f(xn)}')) is a state of X. Hence for some i, xi∈ f-1({f(xi)}'), i.e. f(xi)∈ {f(xi)}', which is impossible. QED
The above caters for the case of a structure with a single relation. A more general case allows multiple relations, as in (X,R1,R2,…,Rn) where the i-th relation Ri has arity &alphai.
This case is easily reduced to the single-relation case by first forming R' as the product ΠiRi. R' is the set of all n-tuples whose i-th component is some αi-tuple from Ri. Take R to be the set of `flattened'' n-tuples each resulting from concatenating the constituent αi-tuples of R' to form a single composite tuple of arity Σiαi.
Claim: The homomorphisms between single-relation structures (X,R) formed in this way coincide with the homomorphisms between the n-relation structures (X,R1,R2,…,Rn) they were derived from. This is because preserving the composite tuples is equivalent to preserving each of their constituent subtuples from the individual relations Ri.
This can be further extended to multisorted relational structures (X1,…,Xm; R1,…,R1n). Here each Ri is a subset of some mixed product of Xj's. Homomorphisms between such structures are m-tuples (f1,…,fm) of functions fj: Xj→Yj.
This case is reducible to the previous single-sorted case by replacing the Xi's by X = ΣiXi. In effect this erases the type information distinguishes the sorts from one another, thereby collecting all the elements into one undifferentiated set X. The m-tuples of functions can then be viewed as a single function from X to Y. We represent this single-sorted version as before.
However the resulting single-sorted continuous functions, though respecting the tuples, need not preserve sorts. That is, we have not prevented f(x) from being an element of Y of sort different from that of x. We enforce that requirement by replacing the K arising in the single-sorted construction by K×m where m = {1,2,…,m}, and replacing the matrix r: X×A → K arising in that construction by r': X×A → K×m defined as r'(x,a) = (r(x,a),j) where x was originally in Xj. The adjointness condition s(f(x),b) = r(x,g(b)) then forces f(x) to have the same type as x.
Apart from crossword puzzles being real, is there anything else about Chu transforms that has anything to do with the real world? Well, suppose you are writing an email with your favorite editor and you have just entered the first three letters, say "cat", when you realize you need to go back and delete the first letter. You have transformed the 3-letter string "cat" to the 2-letter string "at".
Now think of the set of all 3-letter strings as the individuals of a Chu space whose "states" are pos1, pos2, pos3, with the value of state pos1 at any given string being the letter in position 1, and so on for the other positions. If your alphabet has 26 letters then these states have 26 "truth values" and that set will have 26x26x26 = 17576 strings of length 3.
Likewise think of the set of all 2-letter strings as the individuals of a Chu space whose states are pos1 and pos2. This is a smaller set with only 26x26 = 676 strings of length 2.
The act of deleting the first letter can be understood in terms of a function from strings of length 3 to strings of length 2. But it can also be understood as a function from {pos1,pos2} of the target to {pos1,pos2,pos3} of the source, namely the function that maps pos1 (in the target) to pos2 (in the source) and pos2 (in the target) to pos3 (in the source).
The latter function tells your editor how to obtain the new string from the old one. The letter at pos1 now must be whatever it was before at pos2, and likewise pos2 must have come from pos3.
So the simple act of deleting a letter from a text buffer is a Chu transform. In one world it transforms one set of strings to a smaller set of strings (smaller by a factor of the size of the alphabet). In the dual world, at the same time it is busy transforming one set of positions backwards in time to another, by explaining to the editor how to obtain the letters in the new string from those of the old.
The forward-looking function is the transformation you thought you were applying to the space of strings, taking you from the past to the future. The backward-looking one is the hidden historian, constructing the future by looking back into the past. Each determines the other, and together they constitute the Chu transform you realized by hitting backspace.
Anyone who has ever implemented an editor knows that deleting the last letter of the buffer is the easy operation; deleting any other letter requires moving all the following letters up one place to fill in the gap (or some equivalent thereof). We can see this directly from the backwards function, which leaves unchanged the positions before the cursor and subtracts one from those after.
In the foregoing we called the strings points and the positions states. But when you're staring at a particular string, what you see are letters at positions. So it really seems as though the positions should be the points and the letters at those positions the local state at each position, with the whole string being the global state of the text buffer. How did we end up with them the other way round?
In fact we could have transposed this matrix to make the positions points and the letters states. But then the Chu transform for deleting the first letter would point backwards in time, which is counterintuitive. How do we reconcile the conflicting results of these two points of view?
When we are processing information we are transforming states of the buffer, where a state conveys information. Such a transformation can be regarded as mental in nature. If we try to view information-manipulating operations in terms of physical positions of letters, transformation seems to be going backwards. But if we view it in terms of mental states it goes forwards, which is the customary direction we associate with transformations. So evidently (if it wasn't already obvious), deleting a letter is fundamentally an information-processing or mental activity rather than a physical one.
Another thing we can do besides deleting information is copying it. Suppose we duplicate the first letter, that is, abc becomes aabc. The Chu transform expressing this editing operation maps the set of 3-letter strings to the set of 4-letter strings. But it also maps the set {pos1,pos2,pos3,pos4} backwards to {pos1,pos2,pos3} by sending both pos1 and pos2 to pos1, pos3 to pos2, and pos4 to pos3.
Once again we find that we are transforming mental entities, namely the possible states of the buffer, forwards in time while transforming the physical positions backwards in time. So copying is also an information-processing or mental transformation. Intuitively obvious of course, but it is nice to see that our intuition is supported by the respective directions taken by the mental and physical halves of the Chu transform representing this particular copy operation.
Are all editing operations mental in this sense? No. Consider the operation of extending a buffer of length 2 by appending one more position. This, we claim, is a physical operation, involving no information processing. All it does is create a new position.
This operation cannot be described as a function from the set of strings of length 2 to those of length 3 because we do not know which string to send say ab to; it could be any of aba or abb or ... Nor is there a function from {pos1,pos2,pos3} back to {pos1,pos2} because we do not know where to send pos3.
On the other hand there is a function from the set of strings of length 3 to those of length 2, which sends each string abc to ab. And this is mirrored by a function from {pos1,pos2} to {pos1,pos2,pos3} which sends each of pos1 and pos2 to itself. So here we have a Chu transform for which the physical half is going in the forward direction. Creating space for letters is physical.
The very act of building a computer memory creates space for information. So the manufacturers of computers are not processing information, they are performing physical actions. Again intuitively obvious, but again it is nice to see this close up and in slow motion.
This section takes what for some at least is hopefully an entertaining detour into physics. By all means take it with a grain of salt, and bear in mind that nothing in this section should be taken as reflecting poorly on the material outside; we are here simply to have fun for a little while. While I see nothing fundamentally erroneous in the arguments below, a real physicist might well complain.
So far we have only considered mental states. What about physical states? Aha, here we take the position that there is no such thing. States are states, regardless of whether the points they are states of are part of a brain, part of a computer, or part of a swinging hammer. All states are mental, and the points that coordinatize state vectors are physical.
The counterparts of point and state for physics are time and energy respectively (along with space and momentum respectively but let's keep things simple here). Points in time can reasonably be considered physical, but surely energy is physical too (it's all physics after all).
Mass lives on the energy side of time-energy duality, and indeed is interchangeable with energy via E = mc2. Surely mass is physical at least!
Here we really go out on a limb and declare that energy, mass, and for that matter momentum, are all fundamentally mental qualities rather than physical.
Now one place that information does creep into physics is entropy, which except for a minus sign is information. Entropy S is intimately correlated with energy Q via the thermodynamic relationship dQ = T dS. That is, energy flux is in proportion to entropy flux with temperature constituting the constant of proportionality. Rewriting the relationship as T = dQ/dS, we can view temperature as joules (energy) per negative bit (entropy or neg-information).
To make the notion of flux more vivid we can divide both sides of either of the first two equations by dt. The second equation then becomes dS/dt = dQ/dt / T. The units on the left are bits per second (data rate) while those on the right are ergs per second (power) per degree. For a given level of power, temperature acts to dilute data rate.
So what is temperature? When analyzed carefully, temperature of a system of particles, whether those particles form a gas, liquid, or solid, turns out to be a statistical notion expressing where the distribution of energies of the particles of the system is most strongly peaked. For large systems in thermal equilibrium, the law of large numbers makes this peak astonishly narrow, giving such systems a very well-defined temperature. This is a fundamental fact of statistical mechanics.
The statistical aspect aside, the basic idea with temperature of a system is that it is the prevailing energy of the particles of that system. Now the more energetically particles bang around, the noisier one can imagine things getting. Entropy is the result of this noise "drowning out" the information flow that we call energy flow.
Following our principle that energy is really information, we can understand entropy flow as the dilution of information flow, in the form of energy flow, by the noise that is temperature. This is the real content of dS = dQ/T. As with entropy, there is a minus sign that is needed to complete the connection between energy and information: energy like entropy is negative information.
The significance of all this is that both energy and entropy can be understood as information, respectively before and after dilution by temperature when viewed differentially as flux.
Since mass is just slowly moving energy (and lots of it for even a small amount of mass, c2 being a very large number), we have that mass too is information, huge amounts of it, and hence mental.
One situation having a clear-cut connection between mass and information is black holes. Here however we have that mass behaves not as information directly but as the square root of information. The information in a black hole is its area while its mass is the square root of its area. When black holes collide they merge violently to produce a single black hole, the only absolutely irreversible process in all of physics. The information in the resulting black hole in bits is the sum of the number of bits in each of the two colliding black holes.
If the colliding holes contribute 1 bit each then each weighed one mass unit (for the appropriate choice of unit). The resulting 2-bit black hole then weighs 1.414 mass units, with the missing .586 mass units radiated violently away as gravitational waves.