My Initial Struggle with List Copying
When I began my programming journey in Python, I encountered a scenario that many new programmers face: copying a list and passing it to a function for further modification. Here's the approach I initially used and why it didn't work as expected.
def square_nums(nums_list):
...
original_list = [1, 2, 3, 4, 5]
copied_list = original_list
squared_nums = square_nums(copied_list)
However, I soon realized a critical issue: any modifications made to copied_list within the function were also reflected in original_list. This was puzzling and led me to delve deeper into Python's documentation.
The Revelation: Aliases vs. Actual Copies
Upon reading the official documentation, I figured out that the way I was copying the contents of the original_list was fundamentally flawed. In fact, as per Python semantics, I wasn't creating a copy, instead I was just creating an alias using the assignment operator. Any number of aliases we create, they will all point to the same object. That's why when the object itself was getting modified, I was getting altered values in the original_list in my code.
The Correct Approach to Copying in Python
Python provides 2 ways of doing so:
1) Shallow Copy
2) Deep Copy
Both of these are available in the copy module provided as part of Python's standard library.
Shallow Copy
In shallow copy, a new Container/Object is created but the containing items are still a reference to the original Container/Object.
This can be further verified by checking the memory of the containers and their individual items using the id()
function.
This way of copying works best if we've immutable items.
This is achieved by using the copy.copy()
function of the copy module.
import copy
original_list = [1, 2, 3]
# Creates a shallow copy
copied_list = copy.copy(original_list)
# Will print True
print(id(original_list[0]) == id(copied_list[0]))
original_list[0] = 0
# Will print False
print(id(original_list[0]) == id(copied_list[0]))
In the above code, the last print statement will still be False
because integers are immutable. So once original_list[0]
is re-assigned a new value, the reference is updated to the new integer value i.e. 0
. But copied_list[0]
continues to reference the original integer object 1. This behaviour would've been different if original_list
contained mutable objects (like other lists, dictionaries, etc.), where a change in the mutable object inside original_list
would be reflected in original_list
as well due to the shared references.
Due to potential issues that could arise when mutable objects are involved, copy module has another offering: deepcopy.
Deep copy
In shallow copy, a new Container/Object is created and it recursively inserts copies of the objects(It doesn't always copy but depending on the type of the object) found in the original.
import copy
original_list = [1, 2, 3, [2,3,4]]
copied_list = copy.deepcopy(original_list)
# Will print False
print(id(original_list)==id(copied_list))
# Will print True
print(id(original_list[0])==id(copied_list[0]))
# Will print True
print(id(original_list[1])==id(copied_list[1]))
# Will print True
print(id(original_list[2])==id(copied_list[2]))
# Will print False
print(id(original_list[3])==id(copied_list[3]))
In the code above, we see an interesting thing i.e. when the objects are immutable like those at indexes 0, 1, 2 in original_list
then those in copied_list
also reference to the same object. But when the element is a mutable one like a list
then a separate copy is maintained in the copied_list
The Role of the memo Dictionary in Deepcopy
Avoiding Redundant Copies:
When an object is copied using deepcopy, the memo dictionary keeps track of objects that have already been copied in that copy cycle. This is particularly important for complex objects that may refer to the same sub-object multiple times. Without memo, each reference to the same sub-object would result in a new and separate copy, potentially leading to an inefficient duplication of objects.Handling Recursive Structures:
Recursive structures are objects that, directly or indirectly, refer to themselves. For example, a list that contains a reference to itself. deepcopy uses the memo dictionary to keep track of objects that have already been encountered in the copying process. If it encounters an object thatโs already in the memo dictionary, it uses the existing copy from the memo rather than entering an infinite loop trying to copy the recursive reference.
Implementation in User defined Classes
In order to implement the functionalities of copy and deepcopy, we can use the magic methods copy() and deepcopy() within the class.
Internally even the list
data structure is a class and that's how we get the list.copy()
feature.
With this deeper insight into Python's object copying, you're now better prepared to handle the nuances of mutable and immutable objects in your coding adventures. Happy coding!
Top comments (0)