First of all, I need to point out that as it normally is when we talk about programming and design, this is my personal opinion. But, I will try to explain my position the best that I can.
So, what is an Iterator? As the wiki page says, an iterator is an object that allows us to traverse a container. Another definition that I found on the internet is that an iterator is an abstraction of a pointer to an element of a sequence.
Imagine that you have a container with some elements. Maybe a PHP array. An iterator will point to an element of that array and also provide ways so you can get a pointer to the next element of that array.
It worth mentioning that iterators can also be used in containers that not really exist in memory. For example, you could create an iterator that each element is the next possible odd number. This iterator pointer to an imaginary container that holds an infinite amount of all possible odd numbers. Iterators are also used for lazy loading where, for example, you iterate over data that is on a server but you only load it at each iteration.
Iterators appear in many languages like JAVA, C++, JavaScript, and Rust. But the journey for iterators in PHP started in PHP 4. That is when the foreach loop was added and you could use it with arrays with this syntax:
$array = [
'first' => 1,
'second'=> 2,
]
foreach ($array as $key => $value) {
...
}
In the next major version, iterators were added using two interfaces: Iterator and IteratorAggregate. I will not focus on the IteratorAggregate right now because it does not represent an iterator in itself but actually something that we can get an iterator from.
The main focus here will be the Iterator interface. So, let's take a look into its methods:
interface Iterator extends Traversable {
/* Methods */
public current ( void ) : mixed
public key ( void ) : scalar
public next ( void ) : void
public rewind ( void ) : void
public valid ( void ) : bool
}
If you implement this interface, your object can be iterated both by manually calling the next, current, valid methods or by using the same foreach loop that you used for arrays. Inside a foreach, the following methods will be called:
/** First Iteration */
rewind();
valid();
current();
key();
/** Remaining Iterations until valid() returns false */
next();
valid();
current();
key();
And, to demonstrate, here is an implementation of an iterator that read a file line by line:
class FileIterator implements \Iterator {
private $resource;
private $read_lines;
private $current_line;
public function __construct(string $filename) {
$this->resource = fopen($filename, 'r');
$this->read_lines = 0;
$this->current_line = "";
}
public function rewind(): void {
rewind($this->resource);
$this->read_lines = 0;
$this->fetchLine();
}
public function next() {
$this->fetchLine();
$this->read_lines++;
}
public function current(): string {
return $this->current_line;
}
public function key(): int {
return $this->read_lines;
}
public function valid(): bool {
return !feof($this->resource);
}
private function fetchLine(): void {
$this->current_line = rtrim(fgets($this->resource), PHP_EOL);
}
}
Now that we know what are iterators, where do we use them and how do they work in PHP, I want to point out 3 design flaws of PHP iterators. I will enumerate them from what I think is the least serious, to the biggest flaw.
The key() method
The first design flaw is the key()
method. My problem with it is that by being on the interface, everyone that wants to create an iterator needs to implement some type of key for each iteration.
There are some times where this is useful. For example, in an iterator over a database table, we can use the table primary key as the key.
But imagine another iterator that iterates over a list of prime numbers. What would we use as a key? Look at the FileIterator example. There, we are just incrementing a local property as key and this is a very common approach to this problem. Although, we may never actually need to use it.
I imagine that the reason for the key()
, is because of the foreach
. As I said earlier, the foreach came as a syntax sugar to iterate over an array. Since every element of an array has an index, it makes sense to also add to it a way to easily access the index of each element in the iteration. When the iterator interface was added, the PHP core team decided to also use the foreach with the iterator, and to avoid changing the current syntax, they added a key to the iterator.
The current() and valid() method
The problem with current()
and valid()
methods is the horrible time coupling. We always need to call valid()
to check if the iterator is in a valid state before calling current()
. Calling current()
without checking if the iterator is in a valid state is undefined behavior.
A slightly better approach would be to get rid of the valid()
method and add to the current()
method a way to inform that the iteration is over. An even better approach would be to also remove the current()
method and change the next()
method to return the value. As is done in other languages like JavaScript, Rust, and Python.
Another thing that is kind of annoying about having a valid()
and current()
method is that is quite common to have to save extra data inside the iterator. We can see this in the FileIterator where we save the data of each line read when next()
is called to be returned in the current()
method.
The rewind() method
Now, is time for what I think is the worst design flaw, the rewind()
method. The rewind resets the iterator so we can iterate over it again. The problem is that not all iterable things can or should be rewindable. Probably the correct approach would have been to have another interface, something like \RewindableIterator
, that extends the iterator interface with the rewind method.
You might ask, which iterator is not rewindable? Well, a good example can be found in the \Generator
class. As is stated in the PHP manual, generators are forward-only iterators. This means that we cannot rewind it after we start to iterate over it. And what happens if we try to do so? We receive a horrible exception resulting in a very nasty infringement of the Liskov Substitution Principle.
Also, another thing that really grinds my gears is that we always need to call rewind before we start to iterate over an iterator. Calling any method of an iterator without calling rewind first is also undefined behavior. Now, imagine that we have a method that receives an iterator. The problem is if we check if the iterator is valid and it is not, we don't know if it is invalid because the iteration is over or if it is invalid because no one called rewind on it. This gets worse since we can't call a rewind if it is an already started generator as doing so would throw an exception.
Better approaches
Looking at the approach of other languages I personally like two other approaches.
The first one comes from Rust. There, the iterator has only one method: next()
. The next method returns an option that can be either Some(Value) or None. When None is returned it means that iteration is over. I played a little bit with this approach in this project, where I basically rewrite the rust iterator using PHP. I also added some adapters to better integrate it with the current PHP approach.
Another solution is the python way. The python iterator also has only the next()
method. What differs from Rust, is that an exception is thrown when we call a next in a completed iterator. I personally have my problems with this because we are using an exception to control the flow of the program. But in a controlled way, it might be a good approach.
Summing up
As stated above, I think that the PHP iterator has some major design flaws that make working with it quite annoying. I tried to explain why I think this in this article. Also, I have shown other approaches. Nevertheless, I have to say thanks to the PHP core team because overall, I think that the PHP language as a whole has evolved a lot during the last years.
Top comments (2)
I don't know why people hate using exceptions to control flow, I do not have any concerns about it. Not the best thing to do, but with caution it does not harm anyone.
Just a curious fact, I once written a simple Uno game in Java, as an experiment, the game flow was completely controlled by exceptions, but that's a long story. I should write a blog post about it here.
The problem with exceptions is that they can lead to inconsistent states. So yeah, you can use it to control flow if you are VERY careful. But I think that almost always there is a better approach.