Noble-47

Posted on Feb 23

Leveraging The Power Of Iteration Using Python Data Model

#python #oop #programming #coding

Iteration is a core part of many built-in objects in Python such as list and tuple objects. Just as with many other python features, the Python data model gives us a way to create iterable objects that work well with Python's idiomatic features giving our objects the powers needed to accomplish iterative tasks.

This article is a follow up of my previous Python Data Model article - Special Methods. In this article, I assume that you are already familiar with a few daunder/special methods including those for string representation and comparisons. If you are unfamiliar with any of these checkout my previous post that covers what you need to continue with this article.

To make iterable objects, we need a good understanding of how Python implements iteration and what is required to make our own iterable objects. To explain iteration, we will be creating a coordinate geometry point object which is a bit mathematical as was mentioned in the previous post. Along with making our objects iterable, we will also give our objects other cool functionalities that are common to it. We will begin by creating a coordinate geometry point class.

The Point Object

For our Point object, we really just want to create a simple object that works well with the x and y coordinate system (the Cartesian plane) having some functions that are common to a coordinate point. We will begin our class definition by defining the daunder init method and paramters needed.

class Point:
    def __init__(self, coordinates): 
        """
         coordinates : a list or tuple of two numbers,the first
                       specifies the x-axis value,the second
                       specifies the y-axis value
        """
        self.coordinates = tuple(coordinates)

    def __repr__(self):
        return f"Point([{','.join(str(v) for v in self.coordinates)}])"

Our simple Point class requires only a single argument to be initialized, the coordinates argument which is expected to be a sequence that can be turned into a tuple object, this simply means that the coordinates can be a list or any iterable that returns two numbers. The use of coordinates as an iterable that returns numbers is mainly because it allows us to easily extend our class. If in the future we decide to simulate a n-coordinate system that has more than two axes, a Point class definition that specifies x and y as arguments for a 2-dimensional coordinate system point object would be quite difficult to extend especially if n is a bigger number relative to 2. However, in this article, most of our methods would comfortably work on a 2-dimensional coordinate system than any other. Why? simple is better than complex.

A quick noticeable problem with the way the Point class is defined is that the x and y axis values of an instance would have to be accessed from self.coordinates which is not ideal. The problem with letting users access the coordinates property of an instance is the fact that the user could knowingly or unknowingly assign it to another value which could break the code or cause unexpected behaviors. Let us consider a few cases

>>> p = Point((1,2)) # we pass in a tuple as coordinate (1,2)
>>> x,y = p.coordinates # tuple unpacking to access x and y values
>>> print(f'{x}, {y}') 
# return : 1, 2
>>> p.coordinates = None # reassign coordinates
>>> x,y = p.coordinates
# return : Traceback (most recent call last): 
  File "<stdin>", line 1, in <module> 
TypeError: cannot unpack non-iterable NoneType object

Because in our implementation, we do not want users of our class instance to reassign the coordinates property of an instance of our class, we will have to implement a way that gives read-only access to the values contained in the coordinates property and we will also have to declare coordinates as a private variable. Let's begin with the last, making coordinates a private variable.

class Point:
    def __init__(self, coordinates):
        self._coordinates = tuple(coordinates)

    def __repr__(self):
        return f"Point([{','.join(str(v) for v in self._coordinates}])"

Not much of a change right? In our bid to make the coordinates property private, we only used a new variable with the same name except for the underscore in front. Python does not make a variable private, this is just an accepted way among Python programmers to tell others that 'this variable is not to be accessed directly and any error that results by you accessing this variable is completely on you'. For more about this, you can check out GeeksForGeeks article also check out the official python docs. Now that we have made coordinates 'Private', we will now turn our attention to accessing the values in _coordinates without exposing our 'private' variable. One way we can do this is to make our Point instance an iterable object that returns the next value in _coordinates starting from index 0 every time it is looped over like in a for loop.

The first step we will take to make our object iterable is to have a good understanding of what Python considers iterable. To do that, we will consult our favorite doctor, the Python docs (documentation), and here's what it says

iterable:
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an iter() method or with a getitem() method that implements Sequence semantics. -- (python doc)

For our object to be considered an iterable object, it must implement the dunder iter method and an optional dunder getitem method. The __iter__ method is expected to return an iterator object as stated in the Python documentation. Since it is required that our __iter__ method has to return an iterator object, we will also have to know what that is.

iterator
An object representing a stream of data. Repeated calls to the iterator's next() method (or passing it to the built-in function next()) return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again. Iterators are required to have an iter() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop.

I know this is a bit of a lengthy text but let me break it down for you. An iterator object represents a stream of data and must have two methods : iter and next. The next method is responsible for returning successive (the next) items in the stream or raising a StopIteration if the stream is exhausted. while the iter method should return the same iterable object (self).

Notice the difference between the iterator object and the iterable object in Python. The iterable objects implement an optional `getitem` _method and the __iter__ method that returns an iterator object. While the iterator object implements a __next__ method to be consumed by the next built-in function and an __iter__ method to return the iterator object. We can say that iterators are iterable but iterable objects are not iterators

class Point:
    def __init__(self, coordinates):
        self._coordinates = tuple(coordinates)

    def __repr__(self):
    return f"Point([{','.join(str(v) for v in self._coordinates}])"

    def __iter__(self):
        return PointIterator(self._coordinates)

class PointIterator:
    def __init__(self, coordinates):
        self.coordinates = coordinates
        self.count = 0

    def __next__(self):
        if self.count >= len(self.coordinates):
            raise StopIteration
        value = self.coordinates[self.count]
        self.count += 1
        return value

    def __iter__(self):
        return self

Our class PointIterator satisfies the requirement to serve as an iterator and just by creating and returning an instance of PointIterator in the class Point __iter__() method, we have successfully made our Point objects iterable. Let's have a quick walkthrough of our PointIterator class definition. As with most other class definitions, our iterator class has a __init__ method that takes in a single coordinate argument and then initializes the coordinates and count properties of our iterator class. The coordinates would be the coordinates of our Point object. For our use case, coordinates would be a tuple object. The count property is used as a reference to what value in self.coordinates our __next__ method is to return. The __next__ method is where most of the heavy lifting is done. It first checks if the integer variable self.count has exceeded or is equal to the length (len) of the coordinates tuple. We use the ≥ comparison because Python indexing starts from zero, so a tuple of two objects would have a length of 2 but a last index of 1. As stated in the Python docs, we raise a Stopiteration when self.count has reached or exceeded len( self.coordinates). If it hasn't, we just want to return the element at the index corresponding to the value of self.count and then increment count by 1. The __iter__ method simply returns self, the same iterator object on which the built-in method iter is called on. We can check what new capabilities our Point instance now has

# creating a 3 dimensional coordinate system point
>>> p = Point((1,2,3))

>>> print(p)
# return : Point([1,2,3])

# call the iter function on p
>>> it = iter(p)
>>> it 
# return : <__main__.PointIterator object at 0x7...> the repr of 'it'

# call next on it
>>> next(it)
# return : 1 the first number in coordinates

>>> next(it)
# return : 2

>>> next(it)
# return : 3

>>> next(it)
# return : Traceback (most recent call last): 
  File "<stdin>", line 1, in <module> 
  File "test.py", line 22, in __next__ 
    raise StopIteration 
StopIteration
# A StopIteration is raised as expected

"""Our object can also work well with for loops"""
>>> for coord in p:
...    print(coord)
# returns : 
1 
2 
3

We have now successfully made our Point instance iterable but there are other ways to achieve this with way less lines of code. I'll show you two ways to quickly make our objects iterable without having to create our own iterator object.

Falling On The Tuple Iterator Object

We already know that objects like tuples and list are iterables, this means that they also have an iterator objects that handles iteration behind the scene. Let's take a peek

>>> it = iter(tuple([1,2,3]))
>>> it
# return : <tuple_iterator object at 0x...>
>>> next(it)
# return : 1

What this means for us is that we can make use of the tuple iterator for our objects since self.coordinates is a tuple. All we have to do is to modify the __iter_ method in our Point class

class Point:
    def __init__(self, coordinates):
        self._coordinates = tuple(coordinates)

    def __repr__(self):
        return f"Point([{','.join(str(v) for v in self._coordinates}])"

    def __iter__(self):
        return iter(self._coordinates)

Let's test it…

>>> p = Point((1,2,3))
>>> for coords in p:
...    print(p)
...
# return:
1 
2 
3

which is exactly the same as before but with way less code.

Using Generator Expression

This method is the most common, most simple, and also considered the most pythonic. Let's see how it is done

class Point:
    def __init__(self, coordinates):
        self._coordinates = tuple(coordinates)

    def __repr__(self):
        return f"Point([{','.join(str(v) for v in self._coordinates}])"
    def __iter__(self):
        points = (x for x in self._coordinates)
        return points

the expression

(x for x in self._coordinates)

returns a generator object that can be consumed by the iter method for iteration. A simple example

>>> p = Point((1,2,3))
>>> iter(p)
# return : <generator object Point.__iter__.<locals>.<genexpr> at 0x...>
>>> for coord in p:
...    print(coord)
...
# return :
1 
2 
3

This method is preferable because of the nature of generators, they save space by only computing the values when needed. It may not be so obvious for a 3-dimensional point but if we were dealing with a n-dimensional point where n could be a million, generator expression would be the right guy for the job. Just as a tip, because our object is now iterable, we can change our repr code to look better…

    ... 
    def __repr__(self):
        return f"Point([{','.join(str(v) for v in self}])"

This would work just fine because self which is an instance of our class is now iterable. One more thing to notice, we can shorten our iter() method code by returning the generator without creating the point variable...

return (x for x in self._coordinates)

But I find the first to be more readable. Now that we have made our object iterable, there's still a little more work to do.

First I'd like to address a little problem with our object. Making our object iterable did not make it subscriptable. In Python that means we cannot access the coordinates with its index

>>> p = Point((2,3))
>>> p[0]
# return : Traceback (most recent call last): 
  File "<stdin>", line 1, in <module> 
TypeError: 'Point' object is not subscriptable

To make our object subscriptable, we will have to give it a special method python calls in handling subscriptable objects - the getitem method

class Point:
    def __init__(self, coordinates):
        self._coordinates = tuple(coordinates)

    def __repr__(self):
        return f"Point([{','.join(str(v) for v in self._coordinates}])"
    def __iter__(self):
        return iter(self._coordinates)

    def __getitem__(self, index):
        cls = type(self)
        if isinstance(index, int):
            return self._coordinates[index] 

        if isinstance(index,slice):
            return cls(self._cordinates[index])

        raise TypeError(f'{cls.__name__} indicies must be integers or slices, not {type(index).__name__})

Our major focus is the getitem method. It requires an index argument which should be an int or a slice object. If it is none of the mentioned object types, a TypeError should be raised. For checking the type of object index is, we use the isinstance function. Depending on if index is a slice object or an int, in line with how Python list and tuple behaves, we return a new instance of Point that contains only the values that fall into the slice or return a single value. type(self) is just a dynamic way of getting the class of self without hard coding or passing it as an argument to getitem. The error message is actually a simple modification of what a list would also raise. Let's test our object

>>> p = Point((2,3,4)) # A 3-dimensional coordinate point
>>> p[0] # using an int as index
# return : 2
>>> p[-1] # negative indexing
# return : 4
>>> p[:2] # using a slice as index
# return : Point([2,3]) 
>>> p['a'] # using a str for index, expecting a TypeError
# return : Traceback (most recent call last): 
  File "<stdin>", line 1, in <module> 
  File "test.py", line 18, in __getitem__ 
    raise TypeError(f"{cls.__name__} indices must be integers or slices, not {type(index).__name__}")    
TypeError: Point indices must be integers or slices, not str

With this, our object now has full support for slicing and indexing and also a nice formatted error message.

Another feature I would like our Point object to have is read only attributes or labels that will have values corresponding to a coordinate in Point. For example, we can give a 2 dimensional coordinates the label 'xy' where calling point.x returns the coordinate at index 0 and point.y returns the coordinate at index 1. This can be done by adding __getattr__ and __setattr__ methods to our object.

class Point:
    def __init__(self, coordinates, label=None):
        self._coordinates = tuple(coordinates)
        self._label = label
    def __repr__(self):
        return f"Point([{','.join(str(v) for v in self._coordinates}])"
    def __iter__(self):
        return iter(self._coordinates)

    def __getitem__(self, index):
        cls = type(self)
        if isinstance(index, int):
            return self._coordinates[index] 

        if isinstance(index,slice):
            return cls(self._cordinates[index])

        raise TypeError(f'{cls.__name__} indicies must be integers or slices, not {type(index).__name__}')
    def __getattr__(self, name): 
        cls = type(self) 
        msg = f"{cls.__name__} object has not attribute {name}"
        if self._label is None: 
            raise AttributeError(msg)
        if len(name) == 1: 
            l = self._label.find(name) 
            if 0 <= l < len(self._coordinates): 
                return self._coordinates[l] 
        raise AttributeError(msg) 

    def __setattr__(self, name, value): 
        cls = type(self) 
        if len(name) == 1: 
            if self._label is not None and name in self._label: 
                err = f"readonly attribute {name}" 
            elif name.islower(): 
                err = f"cannot set attribute 'a' to 'z' in {cls.__name__}" 
            else: 
                err = "" 
            if err: 
                raise AttributeError(err) 
        super().__setattr__(name, value)

The __getattr__ method serves as a fallback for Python whenever it cannot find an attribute of an object in the object or in any of the parent of the object. To implement our desired feature, our __init__ class is made to have an optional label argument. Our label parameter is set to be None by default meaning that this feature won't be accessible except the user passes in a label argument when creating the object. Before going into the technicalities, let's view the new behavior of our Point object

>>> p = Point((1,2,3), label = 'xyz')
>>> p
# return : Point([1,2,3])
>>> p.x # get the x-axis value
# return : 1
>>> p.y # get the y-axis value
# return : 2
>>> p.z # get the z-axis value
# return : 3
>>> p.x = 10 # assign x attribute of p to a different number
# return : Traceback (most recent call last): 
  File "<stdin>", line 1, in <module> 
  File "test.py", line 45, in __setattr__ 
    raise AttributeError(err) 
AttributeError: readonly attribute x
>>> p.j = 2 # setting a new single letter lower case attribute
# return : Traceback (most recent call last): 
  File "<stdin>", line 1, in <module> 
  File "test.py", line 45, in __setattr__ 
    raise AttributeError(err) 
AttributeError: cannot set attribute 'a' to 'z' in Point
>>> p.some_attribute = 'some_value' # no error
>>> p.some_attribute
# return : 'some_value'

By defining the __getattr__ and the __setattr__ we have given our class instance a whole new ability. As stated before, the __getattr__ method enables our class instance to return specific values based on what label is used but this method doesn't stop a user from modifying (rather creating) a new attribute with one of our label's name. we can comment out the __setattr__ block of code and try the following code

>>> p = Point((1,2,3), label = 'xyz')
>>> p.x
# return : 1
>>> p.x = 10
>>> p.x
# return : 10
>>> p[0] # p still has 1 as it's first value
# return : 1

Remember I did say that the __getattr__ is a fall back method whenever python can't find an attribute (like 'x' in this case). So whenever we call p.x since p by definition does not have an x attribute, __getattr__ is called. But now that we have assigned an attribute x to p with a value 10, calling p.x would return 10 but it won't affect the values in self._coordinates. To stop this behavior, we defined the __setattr__ which Python calls when it wants to assign a new attribute to our object.

By our definition, __setattr__ stops Python from assigning a new attribute whose name is in label or is a lowercase alphabet (because they are special alphabets to our object). If the name of the attribute isn't among the label or a lowercase alphabet, then we return the default behavior by calling super().__setattr__(name, value).

This is why some_attribute can be assigned to our object without raising an error.

Our Point object now has some cool functionalities but lacks a lot, for starters, we cannot add or subtract two instances of our Point instance, find the magnitude of an instance, represent a two-dimensional point as a complex number, and many more. There may be a way to work around these insufficiencies but we can use the special method to implement them and I will show you how we can achieve that in the next article.

It is important to state that our class definition makes a lot of assumptions without implementing a way to validate those assumptions. For instance, there's nothing stopping our user from passing in integers or a list to label instead of string. At the point of initialization, there would be no problem but when we try to get an attribute, our function tries to call find(x) on labels and if labels is not a string, an exception would be raised. We also didn't check if len(label) == len(coordinates) though it doesn't have much effect on our code.

For further study, I'd recommend the Python's official docs as the major reference. You can also check out Fluent Python by Luciano Ramalho. Another resource I came across was this tweet and the follow up tweet by Stephen Gruppetta from which I learnt a great deal about iteration.

I hoped you enjoyed this article and had fun make iterable objects.

DEV Community

Leveraging The Power Of Iteration Using Python Data Model

The Point Object

Falling On The Tuple Iterator Object

Using Generator Expression

Top comments (0)

Read next

10 Postman Alternatives That Will Transform Your Dev Workflow✨

React Router vs. React Router DOM: Key Differences Explained

Control your PlayStation 3 remotely

A Beginner's Guide to C# Programming