Changyu Lee

Dev Log [05.29.25] : Handling CPU-bound Tasks in FastAPI

Published at
2025/05/29
Last edited time
2025/05/28 17:03
Created
2025/05/28 17:01
Section
Dev Log
Status
Done
Series
Tags
Log
AI summary
Keywords
ASGI
FastAPI
ThreadPoolExecutor
Language
ENG

Handling CPU-bound Tasks in FastAPI

Using ThreadPoolExecutor + uvicorn --workers for Effective Parallel Processing

Background

FastAPI is known for its powerful asynchronous capabilities, especially with I/O-bound tasks. However, when it comes to CPU-bound workloads (like OCR, image processing, or heavy computation), async def alone doesn't bring much performance gain.
This post walks you through how I improved the responsiveness and concurrency of my FastAPI app using:
ThreadPoolExecutor for CPU-bound operations
Uvicorn’s -workers option for true multi-processing

The Problem

While building an API for generating scripts using LLMs, I ran into performance bottlenecks due to:
Intensive OCR and image preprocessing (e.g., with PyMuPDF, OpenCV)
Running dozens of LLM summarization and generation calls
Handling large text files and saving structured data
These operations were slowing down requests and leading to poor concurrency.

Solution 1: ThreadPoolExecutor for CPU-bound Async

from concurrent.futures import ThreadPoolExecutor import asyncio executor = ThreadPoolExecutor() def heavy_cpu_task(data): # Do CPU-intensive work here ... loop = asyncio.get_event_loop() result = await loop.run_in_executor(executor, heavy_cpu_task, data)
Python
복사

Why it works:

Runs CPU-heavy functions in background threads
Keeps the FastAPI event loop responsive
Ideal for tasks like OCR, parsing, or heavy LLM formatting

Solution 2: Use -workers for Multi-Processing

Even with run_in_executor, a single FastAPI process may not be enough under heavy load. That’s where multi-process scaling comes in.
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 8
Shell
복사

Why it helps:

Launches 8 independent processes, each handling requests
Bypasses the limitations of Python’s GIL
Great for high concurrency or batch processing

Combined Architecture

[ Client ] ↓ [ FastAPI (async endpoints) ] ↓ [ ThreadPoolExecutor → background CPU-bound task ] ↓ [ One of multiple Uvicorn workers (multi-process) ]
Plain Text
복사
This setup allowed:
Immediate response to the client
Non-blocking heavy computation
Stable parallelism even under high traffic

When to Use What

Task Type
Recommended Approach
I/O-bound (HTTP, DB)
async def
CPU-bound
run_in_executor + ThreadPoolExecutor
High concurrency
uvicorn --workers

Result

After these changes, we achieved:
Better scalability under concurrent users
Faster response times with long-running tasks
More modular and maintainable service logic

Takeaway

Asynchronous programming in FastAPI is powerful—but not a silver bullet. When your app performs CPU-heavy tasks like LLM calls, image processing, or OCR, consider combining ThreadPoolExecutor with uvicorn --workers to unlock true parallel performance.