Disclaimer : If you write Python on a daily basis you will find nothing new in this post. It’s for people who occasionally use Python like Ops guys and forget/misuse its import system. Nonetheless, the code is written with Python 3.6 type annotations to entertain an experienced Python reader. As usual, if you find any mistakes, please let me know!
Modules
Let’s start with a common Python stanza of
if __name__ == '__main__':
invoke_the_real_code()
A lot of people, and I’m not an exception, write it as a ritual without trying to understand it. We somewhat know that this snippet makes difference when you invoke your code from CLI versus import it. But let’s try to understand why we really need it.
For illustration, assume that we’re writing some pizza shop software. It’s on Github. Here is the pizza.py
file.
# pizza.py file
import math
class Pizza:
name: str = ''
size: int = 0
price: float = 0
def __init__(self, name: str, size: int, price: float) -> None:
self.name = name
self.size = size
self.price = price
def area(self) -> float:
return math.pi * math.pow(self.size / 2, 2)
def awesomeness(self) -> int:
if self.name == 'Carbonara':
return 9000
return self.size // int(self.price) * 100
print('pizza.py module name is %s' % __name__)
if __name__ == '__main__':
print('Carbonara is the most awesome pizza.')
I’ve added printing of the magical __name__
variable to see how it may change.
OK, first, let’s run it as a script:
$ python3 pizza.py
pizza.py module name is __main__
Carbonara is the most awesome pizza.
Indeed, the __name__
global variable is set to the __main__
when we invoke it from CLI.
But what if we import it from another file? Here is the menu.py
source code:
# menu.py file
from typing import List
from pizza import Pizza
MENU: List[Pizza] = [
Pizza('Margherita', 30, 10.0),
Pizza('Carbonara', 45, 14.99),
Pizza('Marinara', 35, 16.99),
]
if __name__ == '__main__':
print(MENU)
Run menu.py
$ python3 menu.py
pizza.py module name is pizza
[<pizza.Pizza object at 0x7fbbc1045470>, <pizza.Pizza object at 0x7fbbc10454e0>, <pizza.Pizza object at 0x7fbbc1045b38>]
And now we see 2 things:
- The top-level
print
statement from pizza.py was executed on import -
__name__
in pizza.py is now set to the filename without.py
suffix.
So, the thing is, __name__
is the global variable that holds the name of the current Python module.
- Module name is set by the interpreter in
__name__
variable - When module is invoked from CLI its name is set to
__main__
So what is the module, after all? It’s really simple - module is a file containing Python code that you can execute with the interpreter (the python
program) or import from other modules.
- Python module is just a file with Python code
Just like when executing, when the module is being imported, its top-level statements are executed, but be aware that it’ll be executed only once even if you import it several times even from different files.
- When you import module it’s executed
Because modules are just plain files, there is a simple way to import them. Just take the filename, remove the .py
extension and put it in the import
statement.
- To import modules you use the filename without the
.py
extensions
What is interesting is that __name__
is set to the filename regardless how you import it – with import pizza as broccoli
__name__
will still be thepizza
. So
- When imported, the module name is set to filename without
.py
extension even if it’s renamed withimport module as othername
But what if the module that we import is not located in the same directory, how can we import it? The answer is in module search path that we’ll eventually discover while discussing packages.
Packages
- Package is a namespace for a collection of modules
The namespace part is important because by itself package doesn’t provide any functionality – it only gives you a way to group a bunch of your modules.
There are 2 cases where you really want to put modules into a package. First is to isolate definitions of one module from the other. In our pizza
module, we have a Pizza
class that might conflict with other’s Pizza packages (and we do have some pizza packages on pypi)
The second case is if you want to distribute your code because
- Package is the minimal unit of code distribution in Python
Everything that you see on PyPI and install via pip
is a package, so in order to share your awesome stuff, you have to make a package out of it.
Alright, assume we’re convinced and want to convert our 2 modules into a nice package. To do this we need to create a directory with empty __init__.py
file and move our files to it:
pizzapy/
├── __init__.py
├── menu.py
└── pizza.py
And that’s it – now you have a pizzapy
package!
- To make a package create the directory with
__init__.py
file
Remember that package is a namespace for modules, so you don’t import the package itself, you import a module from a package.
>>> import pizzapy.menu
pizza.py module name is pizza
>>> pizzapy.menu.MENU
[<pizza.Pizza object at 0x7fa065291160>, <pizza.Pizza object at 0x7fa065291198>, <pizza.Pizza object at 0x7fa065291a20>]
If you do the import that way, it may seem too verbose because you need to use the fully qualified name. I guess that’s intentional behavior because one of the Python Zen items is “explicit is better than implicit”.
Anyway, you can always use a from package import module
form to shorten names:
>>> from pizzapy import menu
pizza.py module name is pizza
>>> menu.MENU
[<pizza.Pizza object at 0x7fa065291160>, <pizza.Pizza object at 0x7fa065291198>, <pizza.Pizza object at 0x7fa065291a20>]
Package init
Remember how we put a __init__.py
file in a directory and it magically became a package? That’s a great example of convention over configuration – we don’t need to describe any configuration or register anything. Any directory with__init__.py
by convention is a Python package.
Besides making a package __init__.py
conveys one more purpose – package initialization. That’s why it’s called init after all! Initialization is triggered on the package import, in other words, importing a package invokes__init__.py
- When you import a package, the
__init__.py
module of the package is executed
In the __init__
module you can do anything you want, but most commonly it’s used for some package initialization or setting the special __all__
variable. The latter controls star import – from package import *
.
And because Python is awesome we can do pretty much anything in the __init__
module, even really strange things. Suppose we don’t like the explicitness of import and want to drag all of the modules’ symbols up to the package level, so we don’t have to remember the actual module names.
To do that we can import everything from menu
and pizza
modules in__init__.py
like this
# pizzapy/__init__.py
from pizzapy.pizza import *
from pizzapy.menu import *
See:
>>> import pizzapy
pizza.py module name is pizzapy.pizza
pizza.py module name is pizza
>>> pizzapy.MENU
[<pizza.Pizza object at 0x7f1bf03b8828>, <pizza.Pizza object at 0x7f1bf03b8860>, <pizza.Pizza object at 0x7f1bf03b8908>]
No more pizzapy.menu.Menu
or menu.MENU
:-) That way it kinda works like packages in Go, but note that this is discouraged because you are trying to abuse the Python and if you gonna check in such code you gonna have a bad time at code review. I’m showing you this just for the illustration, don’t blame me!
You could rewrite the import more succinctly like this
# pizzapy/__init__.py
from .pizza import *
from .menu import *
This is just another syntax for doing the same thing which is called relative imports. Let’s look at it closer.
Absolute and relative imports
The 2 code pieces above is the only way of doing so-called relative import because since Python 3 all imports are absolute by default (as in PEP328), meaning that import will try to import standard modules first and only then local packages. This is needed to avoid shadowing of standard modules when you create your ownsys.py
module and doing import sys
could override the standard library sys
module.
- Since Python 3 all import are absolute by default – it will look for system package first
But if your package has a module called sys
and you want to import it into another module of the same package you have to make a relative import. To do it you have to be explicit again and write from package.module import
or
somesymbolfrom .module import somesymbol
. That funny single dot before module name is read as “current package”.
- To make a relative import prepend the module with the package name or dot
Executable package
In Python you can invoke a module with a python3 -m <module>
construction.
$ python3 -m pizza
pizza.py module name is __main__
Carbonara is the most awesome pizza.
But packages can also be invoked this way:
$ python3 -m pizzapy
/usr/bin/python3: No module named pizzapy. __main__ ; 'pizzapy' is a package and cannot be directly executed
As you can see, it needs a __main__
module, so let’s implement it:
# pizzapy/__main__.py
from pizzapy.menu import MENU
print('Awesomeness of pizzas:')
for pizza in MENU:
print(pizza.name, pizza.awesomeness())
And now it works:
$ python3 -m pizzapy
pizza.py module name is pizza
Awesomeness of pizzas:
Margherita 300
Carbonara 9000
Marinara 200
- Adding
__main__.py
makes package executable (invoke it withpython3 -m package
)
Import sibling packages
And the last thing I want to cover is the import of sibling packages. Suppose we have a sibling package pizzashop
:
.
├── pizzapy
│ ├── __init__.py
│ ├── __main__.py
│ ├── menu.py
│ └── pizza.py
└── pizzashop
├── __init__.py
└── shop.py
# pizzashop/shop.pyimport pizzapy.menuprint(pizzapy.menu.MENU)
Now, sitting in the top level directory, if we try to invoke shop.py like this
$ python3 pizzashop/shop.py
Traceback (most recent call last):
File "pizzashop/shop.py", line 1, in <module>
import pizzapy.menu
ModuleNotFoundError: No module named 'pizzapy'
we get the error that our pizzapy module not found. But if we invoke it as a part of the package
$ python3 -m pizzashop.shop
pizza.py module name is pizza
[<pizza.Pizza object at 0x7f372b59ccc0>, <pizza.Pizza object at 0x7f372b59ccf8>, <pizza.Pizza object at 0x7f372b59cda0>]
it suddenly works. What the hell is going on here?
The explanation for this lies in the Python module search path and it’s greatly described in the documentation on modules.
Module search path is a list of directories (available at runtime as sys.path
) that interpreter uses to locate modules. It is initialized with the path to Python standard modules (/usr/lib64/python3.6
), site-packages
where pip
puts everything you install globally, and also a directory that depends on how you run a module. If you run a module as a file like python3 pizzashop/shop.py
the path to containing directory (pizzashop
) is added to sys.path
. Otherwise, including running with -m
option, the current directory (as in pwd
) is added to module search path. We can check it by printing sys.path
inpizzashop/shop.py
:
$ pwd
/home/avd/dev/python-imports
$ tree
.
├── pizzapy
│ ├── __init__.py
│ ├── __main__.py
│ ├── menu.py
│ └── pizza.py
└── pizzashop
├── __init__.py
└── shop.py
$ python3 pizzashop/shop.py
['/home/avd/dev/python-imports/pizzashop',
'/usr/lib64/python36.zip',
'/usr/lib64/python3.6',
'/usr/lib64/python3.6/lib-dynload',
'/usr/local/lib64/python3.6/site-packages',
'/usr/local/lib/python3.6/site-packages',
'/usr/lib64/python3.6/site-packages',
'/usr/lib/python3.6/site-packages']
Traceback (most recent call last):
File "pizzashop/shop.py", line 5, in <module>
import pizzapy.menu
ModuleNotFoundError: No module named 'pizzapy'
$ python3 -m pizzashop.shop
['',
'/usr/lib64/python36.zip',
'/usr/lib64/python3.6',
'/usr/lib64/python3.6/lib-dynload',
'/usr/local/lib64/python3.6/site-packages',
'/usr/local/lib/python3.6/site-packages',
'/usr/lib64/python3.6/site-packages',
'/usr/lib/python3.6/site-packages']
pizza.py module name is pizza
[<pizza.Pizza object at 0x7f2f75747f28>, <pizza.Pizza object at 0x7f2f75747f60>, <pizza.Pizza object at 0x7f2f75747fd0>]
As you can see in the first case we have the pizzashop
dir in our path and so we cannot find sibling pizzapy
package, while in the second case the current dir (denoted as ''
) is in sys.path
and it contains both packages.
- Python has module search path available at runtime as
sys.path
- If you run a module as a script file, the containing directory is added to
sys.path
, otherwise, the current directory is added to it
This problem of importing the sibling package often arise when people put a bunch of test or example scripts in a directory or package next to the main package. Here is a couple of StackOverflow questions:
The good solution is to avoid the problem – put tests or examples in the package itself and use relative import. The dirty solution is to modifysys.path
at runtime (yay, dynamic!) by adding the parent directory of the needed package. People actually do this despite it’s an awful hack.
The End!
I hope that after reading this post you’ll have a better understanding of Python imports and could finally decompose that giant script you have in your toolbox without fear. In the end, everything in Python is really simple and even when it is not sufficient for your case, you can always monkey patch anything at runtime.
And on that note, I would like to stop and thank you for your attention. Until next time!
Top comments (2)
Coming from C++ and C#, and being just a little more than a casual python scripter, having to check if the module name is "main" feels very hacky and not very well thought through solution for running your main function. It has always bugged me.
Thanks for great article, it gave me some insight about modules.
thanks for the nice explanation .