Handling CPU-bound Tasks in FastAPI
Using ThreadPoolExecutor + uvicorn --workers for Effective Parallel Processing
Background
FastAPI is known for its powerful asynchronous capabilities, especially with I/O-bound tasks. However, when it comes to CPU-bound workloads (like OCR, image processing, or heavy computation), async def alone doesn't bring much performance gain.
This post walks you through how I improved the responsiveness and concurrency of my FastAPI app using:
•
ThreadPoolExecutor for CPU-bound operations
•
Uvicorn’s -workers option for true multi-processing
The Problem
While building an API for generating scripts using LLMs, I ran into performance bottlenecks due to:
•
Intensive OCR and image preprocessing (e.g., with PyMuPDF, OpenCV)
•
Running dozens of LLM summarization and generation calls
•
Handling large text files and saving structured data
These operations were slowing down requests and leading to poor concurrency.
Solution 1: ThreadPoolExecutor for CPU-bound Async
from concurrent.futures import ThreadPoolExecutor
import asyncio
executor = ThreadPoolExecutor()
def heavy_cpu_task(data):
# Do CPU-intensive work here
...
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(executor, heavy_cpu_task, data)
Python
복사
Why it works:
•
Runs CPU-heavy functions in background threads
•
Keeps the FastAPI event loop responsive
•
Ideal for tasks like OCR, parsing, or heavy LLM formatting
Solution 2: Use -workers for Multi-Processing
Even with run_in_executor, a single FastAPI process may not be enough under heavy load. That’s where multi-process scaling comes in.
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 8
Shell
복사
Why it helps:
•
Launches 8 independent processes, each handling requests
•
Bypasses the limitations of Python’s GIL
•
Great for high concurrency or batch processing
Combined Architecture
[ Client ]
↓
[ FastAPI (async endpoints) ]
↓
[ ThreadPoolExecutor → background CPU-bound task ]
↓
[ One of multiple Uvicorn workers (multi-process) ]
Plain Text
복사
This setup allowed:
•
Immediate response to the client
•
Non-blocking heavy computation
•
Stable parallelism even under high traffic
When to Use What
Task Type | Recommended Approach |
I/O-bound (HTTP, DB) | async def |
CPU-bound | run_in_executor + ThreadPoolExecutor |
High concurrency | uvicorn --workers |
Result
After these changes, we achieved:
•
Better scalability under concurrent users
•
Faster response times with long-running tasks
•
More modular and maintainable service logic
Takeaway
Asynchronous programming in FastAPI is powerful—but not a silver bullet. When your app performs CPU-heavy tasks like LLM calls, image processing, or OCR, consider combining ThreadPoolExecutor with uvicorn --workers to unlock true parallel performance.