DEV Community

Ayush
Ayush

Posted on

FastAPI - Concurrency in Python

Recently I delved deep into FastAPI docs and some other resources to understand how each route is processed and what can be done to optimise FastAPI for scale. This is about the learnings I gathered.


A little refresher

Before we go into optimising FastAPI, I'd like to give a short tour on a few technical concepts.

Threads & Processes

Threads share memory space and are easier to create.
Processes have separate memory space and thus require some overhead to create.

Multi-threading & Multi-processing

Multi-threading uses multiple threads within a single process.
Multi-processing utilizes multiple processes, leveraging multiple CPU cores.

Concurrency & Parallelism

Concurrency is managing multiple tasks together. Made possible using event loop.
Parallelism is executing multiple tasks simultaneously. Made possible using multiple CPU-cores.

Python & the GIL

Python's Global Interpreter Lock (GIL) allows only one thread to execute Python bytecode at a time. However, for I/O-bound operations, the GIL is released, allowing other threads to run while waiting for I/O completion. This makes Python particularly effective for I/O-heavy applications.

Quick FYI: In some time, GIL will probably be optional in Python. (PEP 703)

FastAPI

How route handlers are processed

  • Regular route handlers run in an external threadpool
  • Async route handlers run in the main thread
  • FastAPI doesn't affect other (utility) functions

Optimizing FastAPI

Choose based on your task:

  1. I/O tasks with minimal or no CPU work: Use async route handler and await I/O tasks
  2. Non-async I/O tasks: use regular (def) route handler
  3. I/O tasks with significant CPU work: Use regular route handler or async route handler which queues the task for external worker (multi-processing)
  4. High compute tasks: Use multi-processing same as above

The reason I'm suggesting to use regular route handlers for most cases is because we want to keep the main thread available for receiving requests and managing them. If we have any blocking code in main thread (async handlers) it would affect the incoming requests.

Using multiple processes

  • For containerized environments: Use Docker Swarm/Kubernetes to create workers and use a load balancer
  • For traditional setups or just Docker: Use Gunicorn (for process management) with Uvicorn (workers)

Some resources I found to be of great help

Top comments (0)