Dev Log [05.29.25] : Handling CPU-bound Tasks in FastAPI

Published at

2025/05/29

Last edited time

2025/05/28 17:03

Created

2025/05/28 17:01

Section

Dev Log

Status

Done

Series

Handling CPU-bound Tasks in FastAPI

Using ThreadPoolExecutor + uvicorn --workers for Effective Parallel Processing

Background

FastAPI is known for its powerful asynchronous capabilities, especially with I/O-bound tasks. However, when it comes to CPU-bound workloads (like OCR, image processing, or heavy computation), async def alone doesn't bring much performance gain.

This post walks you through how I improved the responsiveness and concurrency of my FastAPI app using:

•

ThreadPoolExecutor for CPU-bound operations

•

Uvicorn’s -workers option for true multi-processing

The Problem

While building an API for generating scripts using LLMs, I ran into performance bottlenecks due to:

•

Intensive OCR and image preprocessing (e.g., with PyMuPDF, OpenCV)

•

Running dozens of LLM summarization and generation calls

•

Handling large text files and saving structured data

These operations were slowing down requests and leading to poor concurrency.

Solution 1: ThreadPoolExecutor for CPU-bound Async

from concurrent.futures import ThreadPoolExecutor
import asyncio

executor = ThreadPoolExecutor()

def heavy_cpu_task(data):
    # Do CPU-intensive work here
    ...

loop = asyncio.get_event_loop()
result = await loop.run_in_executor(executor, heavy_cpu_task, data)
Python
복사

Why it works:

•

Runs CPU-heavy functions in background threads

•

Keeps the FastAPI event loop responsive

•

Ideal for tasks like OCR, parsing, or heavy LLM formatting

Solution 2: Use -workers for Multi-Processing

Even with run_in_executor, a single FastAPI process may not be enough under heavy load. That’s where multi-process scaling comes in.

uvicorn main:app --host 0.0.0.0 --port 8000 --workers 8
Shell
복사

Why it helps:

•

Launches 8 independent processes, each handling requests

•

Bypasses the limitations of Python’s GIL

•

Great for high concurrency or batch processing

Combined Architecture

[ Client ]
   ↓
[ FastAPI (async endpoints) ]
   ↓
[ ThreadPoolExecutor → background CPU-bound task ]
   ↓
[ One of multiple Uvicorn workers (multi-process) ]
Plain Text
복사

This setup allowed:

•

Immediate response to the client

•

Non-blocking heavy computation

•

Stable parallelism even under high traffic

When to Use What

Task Type	Recommended Approach
I/O-bound (HTTP, DB)	async def
CPU-bound	run_in_executor + ThreadPoolExecutor
High concurrency	uvicorn --workers

Result

After these changes, we achieved:

•

Better scalability under concurrent users

•

Faster response times with long-running tasks

•

More modular and maintainable service logic

Takeaway

Asynchronous programming in FastAPI is powerful—but not a silver bullet. When your app performs CPU-heavy tasks like LLM calls, image processing, or OCR, consider combining ThreadPoolExecutor with uvicorn --workers to unlock true parallel performance.