Forem

Roderick Schmogick
Roderick Schmogick

Posted on • Edited on

Collections, Demystified

I am writing this with a view to making headway in wrapping my head around Oracle's Java docs. Simply reading these dry documents is for me a daunting task, to say nothing of comprehending them! But I have found that the information becomes more interesting, and the process of reading more tolerable, when I put in the effort to conceptualize the info in a way that makes sense to me—which makes sense, since info can hardly interest you if you don't understand it.

Fortunately I have some idea as to how I can break this task down into subtasks. At present I am primarily interested in the core Java interfaces. These are the interfaces in the core Java SE APIs, defined in Oracle's API specification for Java SE.

Let us begin with what Oracle identifies as "[t]he root interface in the collection hierarchy": the Collection interface. There's a lot to learn about Collection. In this post we'll focus on what collections and objects of type Collection are.

Collections

According to Oracle, collections are groups of objects called elements. Some collections can contain duplicates of their elements. We call these bags. Other collections cannot. Call them boxes.

(Edit: I forgot Oracle technically says a collection "represents" a group of objects, not that a collection is one. I'll have to think about how that affects what I say below.)

Boxes are like sets in set theory. Just as, in set theory, you can't have more than one of a single object in a set, you can't put more than one of the same object in a box.

Every once in a while you might see multiple instances of the same numeral in notation representing a set, such as {1,2,2,3}.\lbrace 1, 2, 2, 3 \rbrace . But this is just a notational variant of {1,2,3}.\lbrace 1, 2, 3 \rbrace . So is {1,2,3,2},\lbrace 1, 2, 3, 2 \rbrace , or {1,3,2}.\lbrace 1, 3, 2 \rbrace . These are all the same set. Why? Because they have the same elements. There are not actually two 2s2\textrm{s} in the set {1,2,2,3}.\lbrace 1, 2, 2, 3 \rbrace . There's only one, because the number 22 is just a single object. Only one such number exists. No matter how many times you write the numeral 2,\lq 2 \text{\textquoteright}, it always stands for the same number.

Here's a way to see my point. Let xx be 2.2. Then {1,2,2,3}\lbrace 1, 2, 2, 3 \rbrace is {1,x,x,3}.\lbrace 1, x, x, 3 \rbrace. But obviously x\lq x \text{\textquoteright} denotes the same thing wherever it appears. That is, x=x.x = x. So {1,x,x,3}\lbrace 1, x, x, 3 \rbrace can only have three things in it: 1,x,1, x, and 3.3. So by substituting 2\lq 2 \text{\textquoteright} for x,\lq x \text{\textquoteright}, we conclude that {1,2,2,3}\lbrace 1, 2, 2, 3 \rbrace can only have three things in it: 1,2,1, 2, and 3.3.

All this to say, sets never have duplicate elements. Boxes are the same way. Consider the paradigmatic box: an object of type Set (a subinterface of Collection). (From here on out, unless I indicate otherwise, "set" means a Set object.) For any two elements e1 and e2 of a set, !e1.equals(e2). It follows, without loss of generality, that e1 and e2 are not duplicates.

You might doubt these things are true of all sets. Some sets have elements, the thought might go, that lack an equals method. If e1 were one of those elements, the method call e1.equals(e2) would be illegal. So it could not return false as needed for the negation of its return value to be true.

But you would be wrong! Any element of a collection is an object, i.e., an instance of a class. And all classes are subclasses of Object. So they all inherit the equals method from Object. For this reason, no matter the type of e1, e1 has an equals method. How can we be sure though that e2 is a valid argument to pass? We can because it's an instance of Object, as is e1, and the lone parameter of Object's equals method is of type Object.

Null Elements

So it looks like no set can contain multiple copies of an element. But hang on... Can't you put null in a collection? That's not an object. It's a value indicating the absence of an object reference. I'm the one who's wrong then! Not just any element of any collection is an object.

So, what if e1 happens to be null? Then e1 is not an instance of any class, and does not inherit an equals method from Object. In that case it's not strictly true that !e1.equals(e2). But there still are no duplicates of null in the set, according to the documentation for Set. Also, if e1 is null, it is true that e1 and e2 are not such that e1.equals(e2). Why? For those elements to be such that e1.equals(e2), that method call would have to return true. But the invocation is not allowed, so it cannot return true. So the generalization in the documentation still holds true. That is, every set contains no pair of elements e1 and e2 such that e1.equals(e2).

Now we see why Oracle chose to deny any element pair is such that e1.equals(e2), instead of affirming, like I did, that all pairs are such that !e1.equals(e2). Wording matters. But in my defense, what I said was such truthlike, much truthy. Not all pairs of elements make the negation true, but all pairs of objects in any collection do.

There being collections with null elements doesn't square with defining "collections" as groups of objects and "elements" as objects in a collection. Some collections have none but null elements. No null element is an object, though each is in a collection. So, what are collections and elements? To answer this question, we have to consider two things.

First, what is it that all elements of Collection objects have in common, be those elements objects or not? If there is something they all have in common, we may keep calling all Collection objects collections, if we please. If that's what we want, we just have to revise our definition of "collection."

Second, is it true that all Collection objects are collections? Perhaps objects of that type only truly qualify as collections when they contain objects. Then, if a Collection object has only null elements, it's not a collection (group of objects). Assuming that's right, there is no need to revise our definition of "collection," just our notion of which Collection objects the less formal term "collection" applies to. (That's if you have the same notion I had—that all Collection objects are collections.)

I'll take these questions in turn. There is a common feature of all elements of Collection objects. And it's not just that they're elements of Collection objects, of course. Pointing that out doesn't help. We want to know what makes them elements!

The pertinent common feature is their being values. That is, all elements are pieces of data in the form of variables or literals.

Null elements are values known as reference literals. These differ from primitive literals, not with respect to whether they're values, but with respect to whether they are directly assignable—so, assignable without autoboxing—to variables of reference types. These are types like Object and HashSet. We declare variables of these types if we want variables capable of storing objects' memory addresses.

Though primitive literals are values, they cannot be elements. Why not? Because they're neither references nor reference literals. Collection objects can only hold either variables of reference types or values directly assignable to variables of those types. This is because Collection is a blueprint for data structures that are designed to group objects together. Primitives are not objects, nor are they the types of values that point to objects. So there is no need for Collection to accommodate the inclusion of primitives as elements.

But in describing a collection as a group of objects known as its elements, Oracle seemed to be saying a collection's elements are the objects themselves. Not values serving as references to the objects. But then Oracle must mean something different by "element" when it refers to null elements in the documentation for Set.

Suppose we want our usage of "element" to be maximally consistent. Then we're at a crossroads, semantics-wise. Either we stop calling null values elements, or we stop calling objects elements, opting to call values elements instead.

Say we stop calling objects elements. Then we can alternatively define a collection as a group of non-primitive values, and define its elements as those values. This way we can keep calling all Collection objects collections, and keep calling occurrences of null in a collection elements.

If on the other hand we reserve the term "element" for objects, then we have to say a Collection object CC has no elements if all CC "contains" is null. And if CC not only "contains" null, but also contains objects, we must say that those objects CC contains are CsC'\textrm{s} only elements—that null is not one of CsC'\textrm{s} elements. And we must deny that any variable or literal is itself an element of C,C, since no variable or literal is itself an object.

As observed previously, if there are such things as null elements, some Collection objects are not groups of objects—specifically Collection objects with only null elements. So, by Oracle's definition, such objects are not collections. Neither are Collection objects that "contain" null instead of elements, if such objects are possible. They're not groups of objects.

The only way a Collection object CC containing no other objects can be a group of objects is if it's an element of itself. CC would then be like a singleton set (in set theory) in that CC has only one object as an element. But CC would differ from a singleton in that (i) if we count null as an element and CC "contains" null, CC has at least two elements, and (ii) CC is in C.C . (In Zermelo-Fraenkel set theory, no set is a member of itself.)

We might establish a linguistic convention according to which every Collection object is automatically an element of itself. That would make the relation between a Collection object and its elements a bit more like the relation between a set and its subsets in set theory. Every set is a(n improper) subset of itself, because it contains all the elements it contains. It's an idea, but I imagine it would make matters more confusing rather than less.

I say we classify values, not objects, as elements. What do you think?

Top comments (0)