I noticed that my recent daily Python quiz about zip
(I mean an iterable, not the .zip
archive) received a decent attention. Thank you to all and every participant!
No wonder that iterables
in Python is an essential tool for pocking up with data. There are few reasons made iterables
popular and fun: performance, memory efficiency and expressive syntax.
One of the hardest things to understand in a zoo of iterables
is zip
, indeed. When I faced this animal first time I realized its' hidden power and got disappointed at the same time 🤭
Actually, zip
serves two purposes:
- First, is to "sew" two (or more) lists
- Second is to tear them apart.
To illustrate this idea, I prepared the following example.
Suppose, that we have a column-oriented database (i.e. ClickHouse).
We are provided by 3 separate columns of data: first names, last names and professions. Every column contains data about persons who use our software. In real world, we could obtain this data from DBMS, APIs, physical files, user input etc.
# Three columns from the columnar database
first_names = ["Elon", "Steve", "Bob"]
last_names = ["Musk", "Jobs", "Dorf"]
professions = ["builds rockets", "grows Macs", "helps startups"]
"Sew" two or more lists together and make a table from columns using zip
First scenario would be to merge them to make a single list of three persons, not three lists of every person field. In other words, we want to make a table from these 3 lists, containing 3 records. The same way as RDBMS (i.e. Postgres) stores data, and the same way people from OOP world think about it.
zip(first_names, last_names, professions)
This expression will return an iterable
. You're free to materialize it into list
as you wish and finally obtain the desired table in memory:
>>> table = list(zip(first_names, last_names, professions))
[('Elon', 'Musk', 'builds rockets'),
('Steve', 'Jobs', 'grows Macs'),
('Bob', 'Dorf', 'helps startups')]
Now, table[0]
is a record containing Elon Musk who builds rockets, table[1]
is a record containing Steve Jobs who enjoyed growing Macs and Apples (Rest in peace, dear Steve!) and vice versa.
Tear records apart and turn them back into columns
To achieve this, zip
supports an asterisk syntax:
>>> cols = list(zip(*table))
[('Elon', 'Steve', 'Bob'),
('Musk', 'Jobs', 'Dorf'),
('builds rockets', 'grows Macs', 'helps startups')]
Back and again, we obtained columns instead of records! I.e., cols[2]
represent professions and cols[0]
contains only first names.
Iterables
Actually, desperately wrapping iterables
into list
is no good. In real world, columns or records could be very large and even non-fitting the memory. Every time we turn an iterable
into a list
, we allocate memory to store all the data.
Furthermore, columns/records could appear as iterables
themselves, while we need additional processing or streaming the result to another physical device.
To avoid unnecessary allocations we have to keep iterables
as iterables and not convert them into lists
early.
Flame 🔥. Comment. Make cool things!
Top comments (0)