DEV Community

Dash One
Dash One

Posted on • Edited on

Chain - a Goofy, Functional, Tree-backed List

What?

Java has array-based Lists for efficient random access; there are LinkedList for efficient appending. Who needs a tree for List?

Well, hear me out, just for the fun alright?

Once upon a time

Agent Aragorn (Son of Arathorn), Jonny English and Smith collaborated on a bunch of missions.

The Mission class's signature is like:

abstract class Mission {
  abstract MissionId id();
  abstract Range<LocalDate> timeWindow();
  abstract ImmutableSet<Agent> agents();
}
Enter fullscreen mode Exit fullscreen mode

The goal is to create an ImmutableRangeMap<LocalDate, Set<Agent>> (RangeMap is a Guava collection that maps disjoint ranges to values) to account for all the agents during each time window. Note that missions can have overlapping time windows, and agents could work on multiple missions at the same time.

So for missions like:

missions = [{
  timeWindow: [10/01..10/30]
  agents: [Aragorn, English]
}, {
  timeWindow: [10/15..11/15]
  agents: [Aragorn, Smith]
}]
Enter fullscreen mode Exit fullscreen mode

I want the result to be:

[10/01..10/15): [Aragorn, English]
[10/15..10/30]: [Aragorn, English, Smith]
(10/30..11/15]: [Aragorn, Smith] 
Enter fullscreen mode Exit fullscreen mode

At first I thought to use the toImmutableRangeMap() collector, as in:

missions.stream()
    .collect(toImmutableRangeMap(Mission::timeWindow, Mission::agents));
Enter fullscreen mode Exit fullscreen mode

Voila, done, right?

Not quite. My colleague pointed out that toImmutableRangeMap() does not allow overlapping ranges. It wants all input time windows to be disjoint.

RangeMap can merge()

The TreeRangeMap class has a merge() method that already does the heavylifting: finds overlappings and splits the ranges, and then merges the values mapped to the overlapping subrange.

With some effort, I created a toImmutableRangeMap(merger) BiCollector on top of the merge() function.

So if what I needed is just to count the number of agents, I could have done:

import static com.google.mu.util.stream.BiStream.biStream;

ImmutableRangeMap<LocalDate, Integer> agentCounts =
    biStream(missions)
        .mapKeys(Mission::timeWindow)
        .mapValues(mission -> mission.agents().size())
        .collect(toImmutableRangeMap(Integer::sum));
Enter fullscreen mode Exit fullscreen mode

(It'll double count the duplicate agents though)

Anyhoo, here goes the interesting part: how do I merge the Set<Agent>?

Quadratic runtime

I could use Guava's Sets.union():

import com.google.common.collect.Sets;

ImmutableRangeMap<LocalDate, ImmutableSet<Agent>> agentsTimeline =
    biStream(missions)
        .mapKeys(Mission::timeWindow)
        .mapValues(mission -> mission.agents())
        .collect(toImmutableRangeMap((set1, set2) -> Sets.union(set1, set2).immutableCopy()));
Enter fullscreen mode Exit fullscreen mode

The gotcha is that each time merging happens, merging two original sets into one is O(n) where n is the number of agents from the two overlapping ranges. If we are unlucky, we can get into the situation where a time window is repetitively discovered to overlap with another time window, and we keep copying and copying over again. The time complexity is quadratic.

Stack overflow

Could I remove the .immutableCopy()? Sets.union() returns a view that takes constant time so we should be good?

Not really. We don't know how many times merging will happen, a Set can be unioned, then unioned again for unknown times. In the worst case, we'd create a union-of-union-of-union N levels deep. If N is a large number, we'll stack overflow when we try to access the final SetView!

The same will happen if for example I use Iterables.concat() or Stream.concat() (javadoc discusses this problem).

And in case it wasn't obvious, the merging cannot modify either of the two lists or sets, because they are still associated with the sub-range that doesn't overlap. So we need it to be immutable.

If I had one of the persistent collections in the dependencies, I might just use it (they offer O(logn) performance for concatenation but usually are close to constant time). But I don't. And it doesn't feel like worth it to pull in one of such libraries for a single use case.

Put it in a tree

I slept on this problem for about two days for an idea to come to me: can we use something like Haskell's List?

Tl;dr, Haskell's List is like LinkedList except it's immutable. So given a list of [2, 3], you can cons the number 1 onto the list to get a new instance of [1, 2, 3]. Under the hood it's as simple as creating a new object with the internal tail pointer pointing to the old [2, 3] list.

If I can do this, each time merging happens, I only need to pay O(1) cost. The resulting object is probably less efficient for random access than ArrayList or Guava's ImmutableList because of all the pointers and indirections. But that's okay. When the whole split-merge process is done, I can perform a final copy into ImmutableList, which is O(n).

The only problem? Haskell's cons only allows to add one element, while I have two List<Agent>s to concatenate (I can't cons every element from one of the lists, because then I'm getting back to quadratic).

To support concat(list1, list2), I decided to use a binary tree to represent the List's state:

private static final class Tree<T> {
  final T mid;
  @Nullable final Tree<T> left;  // null means empty
  @Nullable final Tree<T> right;  // null means empty

  Tree(Tree<T> left, T value, Tree<T> right) {...}
}
Enter fullscreen mode Exit fullscreen mode

In the list, the elements in left show up first, followed by mid, then followed by the elements in right. In other words, an in-order traversal will give us back the list.

The key trick is to figure out how to concatenate two binary trees into one. Intuitively, I need to find the new "mid point" value, which can be either the left tree's last element, or the right tree's first element. Say, if I take the right tree's first element, then the new tree's left remains the old left, while the new tree's right would need to be the old right after removing the first element.

Wrap it up

Since the Tree is immutable, how do I remove any element at all? And in a binary tree, finding the first element takes up to O(n) time (it's not a balanced tree).

It turns out there's a law in computer science:

All problems in computer science can be solved by another level of indirection

In human language: if a problem can't be solved with one layer of indirection, add two :)

Here goes my second layer of indirection that handles the remove first element from an immutable list task:

public final class Chain<T> {
  private final T head;
  @Nullable private final Tree<T> tail;

  public static <T> Chain<T> of(T value) {
    return new Chain<>(value, null);
  }

  public static <T> Chain<T> concat(Chain<T> left, Chain<T> right) {
    T newHead = left.head;
    Tree<T> newTail = new Tree<>(left.tail, right.head, right.tail);
    return new Chain<T>(newHead, newTail);
  }
}
Enter fullscreen mode Exit fullscreen mode

This is quite like Haskell's cons list, except the tail is a binary tree instead of another cons list.

Now because both left and right Chain already have the first element readily accessible, I can just take the right head as the new mid point to build the new tree, with the tail from left and the tail from right. This new Tree maintains the order invariant as in left.tail -> right.head -> right.tail. And of course the left's head becomes the new Chain head.

It takes a bit of brain gymnastics. But if you sit down and think for a minute, it's actually pretty straight forward.

This solves the O(1) concatenation. And the good thing is that, no matter how deep concat() is nested, the result is always one layer of Chain with a heap-allocated Tree object.

Now we just need to make sure when we iterate through the Chain, we take no more than O(n) time, and constant stack space.

Flattening tree back to List

My secret weapon is Walker.inBinaryTree() (but you can create your own - it's standard tree traversal stuff). It already does everything I needed:

  1. O(n) time lazy in-order traversal.
  2. Constant stack space.

Using it is pretty simple. First we add a stream() method to the Tree class:

private static final class Tree<T> {
  ...

  Stream<T> stream() {
    return Walker.<Tree<T>>inBinaryTree(t -> t.left, t -> t.right)
        .inOrderFrom(this)
        .map(t -> t.mid);
  }
}
Enter fullscreen mode Exit fullscreen mode

The inOrderFrom() method returns a lazy stream, which will take at the worst case O(n) heap space and constant stack space.

Then we wrap and polish it up in our wrapper Chain class:

public final class Chain<T> {
  ...

  /**
   * Returns a <em>lazy</em> stream of the elements in this list.
   * The returned stream is lazy in that concatenated chains aren't consumed until the stream
   * reaches their elements.
   */
  public Stream<T> stream() {
    return tail == null
        ? Stream.of(head)
        : Stream.concat(Stream.of(head), tail.stream());
  }
}
Enter fullscreen mode Exit fullscreen mode

With that, it gives me O(n) time read access to the tree and I can easily collect() it into an ImmutableList.

In the actual implementation, I also made Chain implements List to make it nicer to use, and used lazy initialization to pay the cost only once. But that's just some extra API makeup. The meat is all here.

In Conclusion

Chain is a simple immutable List implementation that you can append, concatenate millions of times.

A bit of googling shows that people have run into similar needs but I didn't find a similar implementation that handles both the O(1) concatenation time and stack overflow concern.

Top comments (0)