Asynchronous I/O was introduced in Python 3.5 as an alternative to threads to handle concurrency.
The promises of asynchronous I/O, and of the asyncio
implementation in Python, are that by not spawning memory-expensive OS threads, systems use less resources and are more scalable. Also, in asyncio
, schedule points are explicit through the await
syntax whereas in thread-based concurrency, the GIL may be released at hard to guess points of the code. So asyncio
based concurrent systems are easier to reason about and to debug. Eventually, it's possible to cancel asyncio
tasks which is not easily doable with threads.
But to really benefit from these promises, it is very important to not do blocking calls in async coroutines. A blocking call can be a network call, a file system call, a sleep call and so on.
These blocking calls are harmful because under the hood, asyncio
uses a single-threaded event loop where coroutines are run concurrently. So if a blocking call is made in a coroutine, it blocks the entire event loop and all the coroutines. This impacts the overall performance of the application.
Here is an example where a blocking call prevents concurrent execution of the code:
import asyncio
import datetime
import time
async def example(name):
print(f"{datetime.datetime.now()}: {name} start")
time.sleep(1) # time.sleep is a blocking function
print(f"{datetime.datetime.now()}: {name} stop")
async def main():
await asyncio.gather(example("1"), example("2"))
asyncio.run(main())
When run, this outputs something like:
2025-01-07 18:50:15.327677: 1 start
2025-01-07 18:50:16.328330: 1 stop
2025-01-07 18:50:16.328404: 2 start
2025-01-07 18:50:17.333159: 2 stop
We see that the 2 coroutines were not run concurrently.
To overcome this you need to use a non-blocking equivalent or defer the execution to a thread-pool:
import asyncio
import datetime
import time
async def example(name):
print(f"{datetime.datetime.now()}: {name} start")
await asyncio.sleep(1) # replaced the blocking time.sleep call by the non-blocking asyncio.sleep coroutine
print(f"{datetime.datetime.now()}: {name} stop")
async def main():
await asyncio.gather(example("1"), example("2"))
asyncio.run(main())
When run, this outputs something like:
2025-01-07 18:53:53.579738: 1 start
2025-01-07 18:53:53.579797: 2 start
2025-01-07 18:53:54.580463: 1 stop
2025-01-07 18:53:54.580572: 2 stop
Here the 2 coroutines were run concurrently.
Now the problem is that it is not always easy to identify whether a method is blocking or not. Especially if the code base is big or uses third-party libraries. Sometimes blocking calls are made in deeply buried parts of the code.
For instance, is this code blocking ?
import blockbuster
from importlib.metadata import version
async def get_version():
return version("blockbuster")
Did Python load package metadata in memory at startup? Is it done when loading the blockbuster
module? Or when we call version()
? Is the result cached and subsequent calls will be non-blocking?
The correct answer is that it is done when calling version()
and it involves reading the METADATA file of the installed package. And the result is not cached. So version()
is a blocking call and should always be deferred to a thread. This fact is hard to know without diving into the code of importlib
.
One way of detecting blocking calls is to activate asyncio's debug mode to log blocking calls that take too long. But that's not the most efficient way as a lot of shorter than the trigger timeout blockings are still harmful for the performance and blocking times in test/development may be different than in production. For instance a database call may take longer in production if it has to fetch a lot of data.
This is where BlockBuster saves the day!
When activated, BlockBuster
will monkey-patch several blocking Python framework methods to make them raise an error if they are called from an asyncio
event loop.
The default patched methods include methods for os
, io
, time
, socket
, sqlite
modules. For a full list of methods detected by BlockBuster
, see the project README.
Then you can activate BlockBuster
during your unit tests or in development mode to catch any blocking calls and fix them.
If you know the awesome BlockHound library in the JVM, it's the same principle but for Python. BlockHound
was a great inspiration for BlockBuster
, kudos to the creators.
Let's see how to use BlockBuster
on the above snippet of blocking code.
First, we need to install the blockbuster
package
pip install blockbuster
Then we can use a pytest fixture and the blockbuster_ctx()
method to activate BlockBuster
at the beginning of each test and deactivate it during teardown.
import asyncio
import datetime
import pytest
import time
from blockbuster import blockbuster_ctx
async def example(name):
print(f"{datetime.datetime.now()}: {name} start")
time.sleep(1)
print(f"{datetime.datetime.now()}: {name} stop")
async def main():
await asyncio.gather(example("1"), example("2"))
@pytest.fixture(autouse=True)
def blockbuster():
with blockbuster_ctx() as bb:
yield bb
async def test_main():
await main()
If you run this with pytest you get
FAILED test_example.py::test_main - blockbuster.blockbuster.BlockingError: Blocking call to sleep (<module 'time' (built-in)>)
Note: typically, in a real project the
blockbuster()
fixture would be set in aconftest.py
file.
Conclusion
I believe BlockBuster
is quite useful in asyncio
projects. It has already helped me detect a lot of blocking call issues on projects I work on.
But it's not a silver bullet. In particular, some third-party libraries don't use Python framework methods to interact with the network or the file system and instead wrap a C library. For these, it is possible to add rules to trigger on blocking calls of these libraries in your test setup. Also BlockBuster
is open-source: contributions are very welcome to add rules for your favourite library in the core project.
And if you see issues and things that could be improved, I will be pleased to get your feedback in the project issue tracker.
Some links:
Top comments (0)