Fast Inference for Immich ML OCR with PaddleX
This guide provides step-by-step instructions to set up PaddleX for fast OCR inference on GPU for Immich ML. By following these steps, you can significantly improve the performance of OCR tasks in your Immich setup.
Context
Immich ML’s inference engine uses Onnxruntime as the default backend for model inference. However, for OCR tasks which were introduced in Immich 2.3.x, users have reported very slow GPU performance sometimes even slower than CPU inference (see Reddit discussion, GitHub Issue #23462) even when using a powerful GPU or the mobilde version of the OCR model.
This performance issue was caused by the Onnxruntime GPU execution engine not being optimized for dynamic input sizes, which is common in OCR tasks. This issue is mentioned in the RapidOCR Docs and was brought up to the Immich team in GitHub Issue #23462 and this GitHub issue comment by me.
I’m also facing issue with my remote machine with RTX 3080ti 12GB running server model with 10 concurrent requests. The GPU is at a 100% with 170W/350W power consumption. 10 images are taking around 2 minutes to finish, even with concurrency of 5 or 1 it’s slow.
Since I also have 100K+ photos in my Immich instance, I wanted to find a solution to speed up the OCR inference process or this would take forever to finish and costed an arm and a leg in electricity bills.
Solution: Routing OCR Inference to PaddleX OCR Pipeline
After investigating various options, I found that PaddleX provides a highly optimized OCR pipeline for the same model used in Immich ML, which can leverage GPU acceleration effectively even with dynamic input sizes. Cool!
My approach requires no building from source, it only relies on mounting modified files into the Immich ML Docker and PaddleX containers to route the OCR inference to a PaddleX service.
Step 1: Set Up PaddleX OCR Service
First, we need to set up a PaddleX OCR service that will handle the OCR inference requests. To do this, we will create a Docker container running PaddleX with the OCR model.
- Create a new directory for the PaddleX Docker setup or clone this project’s GitHub repository.
- Create a
Dockerfilewith the following content found here PaddleX Dockerfile:
# paddle_ocr/Dockerfile
FROM ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/paddlex:paddlex3.3.4-paddlepaddle3.2.0-gpu-cuda11.8-cudnn8.9-trt8.6
RUN python -m pip install "paddleocr[all]==3.3.2"
RUN paddlex --install serving
Note that i tested with PaddleX 3.3.4 and PaddleOCR 3.3.2 versions and made my changes based on these versions. Any future versions might require adjustments. Also since the containers are based on Baidu’s registry i wasn’t able to check which is the latest version available there and just grabbed the latest one i could find mentioned on github.
- Put the following two files in the root of the main directory:
Paddlex replacements files paddlex_ocr_pipeline.py and paddlex_ocr_result.py.
Key changes from original pipeline.py:
- Line 362: Extract
dt_scoresfrom detection results - Line 372: Pass
dt_scoresto the OCR processing data - Line 383, 388: Add
dt_scoresto the zip iteration to ensure scores are properly propagated
--- pipeline.py
+++ pipeline.py
@@ -359,6 +359,7 @@
)
dt_polys_list = [item["dt_polys"] for item in det_results]
+ dt_scores_list = [item["dt_scores"] for item in det_results]
dt_polys_list = [self._sort_boxes(item) for item in dt_polys_list]
@@ -368,6 +369,7 @@
"page_index": page_index,
"doc_preprocessor_res": doc_preprocessor_res,
"dt_polys": dt_polys,
+ "dt_scores": dt_scores,
"model_settings": model_settings,
"text_det_params": text_det_params,
"text_type": self.text_type,
@@ -378,11 +380,12 @@
"rec_polys": [],
"vis_fonts": [],
}
- for input_path, page_index, doc_preprocessor_res, dt_polys in zip(
+ for input_path, page_index, doc_preprocessor_res, dt_polys, dt_scores in zip(
batch_data.input_paths,
batch_data.page_indexes,
doc_preprocessor_results,
dt_polys_list,
+ dt_scores_list,
)
]
Key changes from original result.py:
- Line 210: Add
dt_scoresto the data dictionary to ensure scores are included in the results
--- result.py
+++ result.py
@@ -210,6 +210,7 @@
if self["model_settings"]["use_doc_preprocessor"]:
data["doc_preprocessor_res"] = self["doc_preprocessor_res"].json["res"]
data["dt_polys"] = self["dt_polys"]
+ data["dt_scores"] = self["dt_scores"]
data["text_det_params"] = self["text_det_params"]
data["text_type"] = self["text_type"]
if "textline_orientation_angles" in self:
- Create an
OCR.yamlconfiguration file for the PaddleX OCR pipeline with the following content found here OCR.yaml.
Here I used the PP-OCRv5_server_det and PP-OCRv5_server_rec models as they fit in my 12GB GPU memory while providing good accuracy and speed. You can experiment with other models as needed depending on your GPU or language requirements.
The default values for the parameters are taken from the PaddleX and from Immich ML repo. Note that some parameters can be overriden from the Immich side when calling the ML service and will take precedence over these values.
pipeline_name: OCR
text_type: general
use_doc_preprocessor: True
use_textline_orientation: True
SubPipelines:
DocPreprocessor:
pipeline_name: doc_preprocessor
use_doc_orientation_classify: True
use_doc_unwarping: True
SubModules:
DocOrientationClassify:
module_name: doc_text_orientation
model_name: PP-LCNet_x1_0_doc_ori
model_dir: null
DocUnwarping:
module_name: image_unwarping
model_name: UVDoc
model_dir: null
# followed defaults from here https://github.com/PaddlePaddle/PaddleX/blob/release/3.3/paddlex/inference/models/text_detection/predictor.py#L127
# and here https://github.com/immich-app/immich/blob/main/machine-learning/immich_ml/models/ocr/detection.py#L34
SubModules:
TextDetection:
module_name: text_detection
model_name: PP-OCRv5_server_det
model_dir: null
limit_side_len: 960
limit_type: max
max_side_limit: 4000
thresh: 0.3
box_thresh: 0.5
unclip_ratio: 1.6
TextLineOrientation:
module_name: textline_orientation
model_name: PP-LCNet_x1_0_textline_ori
model_dir: null
batch_size: 6
TextRecognition:
module_name: text_recognition
model_name: PP-OCRv5_server_rec
model_dir: null
batch_size: 6
score_thresh: 0.0
return_word_box: True
After setting up these files, you should have the following structure:
paddle_ocr/
├── Dockerfile
├── paddlex_ocr_pipeline.py
├── paddlex_ocr_result.py
└── OCR.yaml
Step 2: Build and Run the PaddleX Docker Container
Now, build and run the PaddleX Docker container with the modified files mounted. Check the paddle-ocr service in the provided docker-compose.yml for reference.
paddle-ocr:
container_name: paddle_ocr
build:
context: ./paddle_ocr
dockerfile: Dockerfile
volumes:
- ./model-cache/:/root/.paddlex/official_models/
# - ./paddle_ocr/paddlex_ocr_pipeline.py:/usr/local/lib/python3.10/dist-packages/paddlex/inference/pipelines/ocr/pipeline.py
# - ./paddle_ocr/paddlex_ocr_result.py:/usr/local/lib/python3.10/dist-packages/paddlex/inference/pipelines/ocr/result.py
- ./paddle_ocr/paddlex_ocr_pipeline.py:/root/PaddleX/paddlex/inference/pipelines/ocr/pipeline.py
- ./paddle_ocr/paddlex_ocr_result.py:/root/PaddleX/paddlex/inference/pipelines/ocr/result.py
- ./paddle_ocr/OCR.yaml:/OCR.yaml
ports:
- "8866:8080"
command: "paddlex --serve --pipeline /OCR.yaml"
restart: unless-stopped
healthcheck:
disable: false
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities:
- gpu
Note that if you change the paddlex image version in the Dockerfile, and paddlex gets reinstalled and upgraded, the path to the pipeline.py and result.py files might change to the newly installed version. You will need to adjust the volume mounts accordingly. I included two possible paths in the example above, uncomment the correct one based on your installation.
Use the following commands to build and run the container:
mkdir -p model-cache
docker-compose build paddle-ocr
docker-compose up -d paddle-ocr
To verify that the PaddleX OCR service is running correctly, you use the following command to download and send a test image for OCR:
Note: The first time you run this command, it might take a while as the model weights and some fonts will be downloaded and cached.
curl -sL "https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/main/docs/images/Banner.png" -o ./ocr_test.png
python -c "import requests, base64; r = requests.post('http://localhost:8866/ocr/', json={'file': base64.b64encode(open('./ocr_test.png', 'rb').read()).decode('utf-8'), 'fileType': 1, 'visualize': False}); result = r.json(); texts = result['result']['ocrResults'][0]['prunedResult']['rec_texts'] if r.status_code == 200 and 'result' in result else []; print(f'✓ OCR server is working - Found {len(texts)} text items:\n' + '\n'.join([f\" - {text}\" for text in texts]) if texts else f'✗ OCR server failed: {r.status_code}')"
You should see output similar to:
✓ OCR server is working - Found 16 text items:
- PP-ChatOCRv4
- PP-StructureV3
- 文档解析
- PP-StructureV3
- 户飞架
- PaddleOCR
- PP-OCRv5
- PaddleOCR3.0
- <I>010101010101
- PP-OCRv5
- 文字识别
- oduct launched
- TextRecognition&DocParsingToolkit
- もじ文字にんしき認識です
- Qi
- √s
Step 3: Modify Immich ML to Use PaddleX OCR Service
Finally, we need to modify the Immich ML service to route OCR inference requests to the PaddleX OCR service instead of using Onnxruntime.
- Similar to the PaddleX setup, we will mount modified files into the Immich ML Docker container to override the OCR pipeline and result handling.
The file to be modified in the immich_ml/main.py file. You can find the modified version here main.py.
Key changes from original main.py:
- Added
OCR_PADDLEX_URLenvironment variable to specify the PaddleX OCR service URL. - Modified the
preload_modelsfunction to skip loading OCR models ifOCR_PADDLEX_URLis set. - Updated the
/inference/endpoint to route OCR requests to the PaddleX service whenOCR_PADDLEX_URLis set. - Implemented
run_paddlex_inferencefunction to handle OCR requests using the PaddleX service. - Added
process_paddlex_resultsfunction to process the results returned by the PaddleX OCR service and format them to match Immich ML’s expected output structure. - Updated error handling to provide detailed logs and HTTP exceptions for PaddleX OCR inference failures.
@@ -1,5 +1,6 @@
import asyncio
import gc
+import io
import os
import signal
import threading
@@ -10,7 +11,10 @@
from typing import Any, AsyncGenerator, Callable, Iterator
from zipfile import BadZipFile
+import cv2
import orjson
+import numpy as np
+from numpy.typing import NDArray
from fastapi import Depends, FastAPI, File, Form, HTTPException
from fastapi.responses import ORJSONResponse, PlainTextResponse
from onnxruntime.capi.onnxruntime_pybind11_state import InvalidProtobuf, NoSuchFile
@@ -43,6 +47,11 @@
lock = threading.Lock()
active_requests = 0
last_called: float | None = None
+OCR_PADDLEX_URL = os.environ.get("OCR_PADDLEX_URL", None)
+if OCR_PADDLEX_URL is not None:
+ log.info(
+ f"OCR_PADDLEX_URL is set to: {OCR_PADDLEX_URL}. Will use PaddleX OCR if set instead of RapidOCR."
+ )
@asynccontextmanager
@@ -113,19 +122,21 @@
ModelTask.FACIAL_RECOGNITION,
)
- if preload.ocr.detection is not None:
+ if preload.ocr.detection is not None and OCR_PADDLEX_URL is None:
await load_models(
preload.ocr.detection,
ModelType.DETECTION,
ModelTask.OCR,
)
- if preload.ocr.recognition is not None:
+ if preload.ocr.recognition is not None and OCR_PADDLEX_URL is None:
await load_models(
preload.ocr.recognition,
ModelType.RECOGNITION,
ModelTask.OCR,
)
+ if OCR_PADDLEX_URL is not None:
+ log.info(f"Using PaddleX OCR pipeline for OCR requests from {OCR_PADDLEX_URL}")
if preload.clip_fallback is not None:
log.warning(
@@ -198,10 +209,158 @@
inputs = text
else:
raise HTTPException(400, "Either image or text must be provided")
- response = await run_inference(inputs, entries)
+ if OCR_PADDLEX_URL is not None and (entries[0][0]["task"] == ModelTask.OCR):
+ log.warning("Using PaddleX OCR pipeline for OCR requests.")
+ response = await run_paddlex_inference(inputs, entries)
+ else:
+ response = await run_inference(inputs, entries)
+ log.warning(f"Inference Output: {response}")
return ORJSONResponse(response)
+def process_paddlex_results(
+ paddlex_response_dict: dict[str, Any],
+ min_recognition_score: float,
+ image_width: int,
+ image_height: int,
+) -> dict[str, Any]:
+ texts = paddlex_response_dict["rec_texts"]
+ boxes = np.array(paddlex_response_dict["dt_polys"], dtype=np.float32)
+ if boxes.shape[0] == 0:
+ return {
+ "ocr": {
+ "box": np.empty(0, dtype=np.float32),
+ "text": [],
+ "boxScore": np.empty(0, dtype=np.float32),
+ "textScore": np.empty(0, dtype=np.float32),
+ },
+ "imageWidth": image_width,
+ "imageHeight": image_height,
+ }
+
+ boxes[:, :, 0] /= image_width
+ boxes[:, :, 1] /= image_height
+ box_scores = np.array(paddlex_response_dict["dt_scores"], dtype=np.float32)
+ text_scores = np.array(paddlex_response_dict["rec_scores"])
+ valid_text_score_idx = text_scores > min_recognition_score
+ # remove the ids where text is empty string
+ for i, text in enumerate(texts):
+ if text.strip() == "":
+ valid_text_score_idx[i] = False
+ valid_score_idx_list = valid_text_score_idx.tolist()
+
+ boxes = boxes.reshape(-1, 8)[valid_text_score_idx].reshape(-1)
+ texts = [texts[i] for i in range(len(texts)) if valid_score_idx_list[i]]
+ boxScore = box_scores[valid_text_score_idx]
+ textScore = text_scores[valid_text_score_idx]
+
+ return {
+ "ocr": {
+ "box": boxes,
+ "text": texts,
+ "boxScore": boxScore,
+ "textScore": textScore,
+ },
+ "imageWidth": image_width,
+ "imageHeight": image_height,
+ }
+
+
+async def run_paddlex_inference(
+ payload: Image | str, entries: InferenceEntries
+) -> InferenceResponse:
+ import base64
+ import requests
+ import traceback
+
+ start_time = time.time()
+ response: InferenceResponse = {}
+
+ try:
+ # Although PaddleX supports overriding some parameters per request,
+ # I only translated the ones used in Immich ML for consistency.
+ # The rest of the parameters can be changed in the ./paddle_ocr/OCR.yaml config file.
+ # Be Careful that whatever is here will override the config file settings.
+
+ if isinstance(payload, Image):
+ image_width = payload.width
+ image_height = payload.height
+ else:
+ image_width = 0
+ image_height = 0
+
+ buffered = io.BytesIO()
+ payload.save(buffered, format="JPEG")
+ encoded_image = base64.b64encode(buffered.getvalue()).decode("utf-8")
+
+ req_payload = {
+ "file": encoded_image,
+ "fileType": 1,
+ "returnWordBox": False,
+ "visualize": False, # default is False
+ }
+
+ max_resolution = entries[0][0]["options"].get("maxResolution", None)
+ if max_resolution is not None:
+ req_payload["textDetLimitSideLen"] = max_resolution
+
+ min_detection_score = entries[0][0]["options"].get("minScore", None)
+ if min_detection_score is not None:
+ req_payload["textDetBoxThresh"] = min_detection_score
+
+ min_recognition_score = entries[1][0]["options"].get("minScore", None)
+
+ # Send the request asynchronously
+ paddlex_response = await run(
+ lambda: requests.post(OCR_PADDLEX_URL, json=req_payload)
+ )
+ log.info(f"Response Status Code: {paddlex_response.status_code}")
+ # log.info(f"Raw PaddleX OCR response: {paddlex_response.text}")
+
+ if paddlex_response.status_code != 200:
+ log.error(
+ f"PaddleX OCR inference failed with status code {paddlex_response.status_code} and message: {paddlex_response.text}"
+ )
+ raise HTTPException(
+ 500,
+ f"PaddleX OCR inference failed with status code {paddlex_response.status_code} and message: {paddlex_response.text}",
+ )
+
+ paddlex_response_dict = paddlex_response.json()["result"]["ocrResults"][0][
+ "prunedResult"
+ ]
+
+ response = await run(
+ process_paddlex_results,
+ paddlex_response_dict,
+ min_recognition_score,
+ image_width,
+ image_height,
+ )
+ end_time = time.time()
+ log.info(f"PaddleX OCR inference took {end_time - start_time:.3f} seconds.")
+ # log.warning(f"OCR PaddleX Output: {response}")
+ return response
+ except HTTPException:
+ # Re-raise HTTPExceptions as-is
+ raise
+ except requests.exceptions.RequestException as e:
+ error_msg = f"PaddleX OCR request failed: {str(e)}"
+ stack_trace = traceback.format_exc()
+ log.error(f"{error_msg}\n{stack_trace}")
+ raise HTTPException(500, f"{error_msg}\nStack trace:\n{stack_trace}")
+ except (KeyError, IndexError, ValueError) as e:
+ error_msg = f"Failed to parse PaddleX OCR response: {str(e)}"
+ stack_trace = traceback.format_exc()
+ log.error(f"{error_msg}\n{stack_trace}")
+ raise HTTPException(500, f"{error_msg}\nStack trace:\n{stack_trace}")
+ except Exception as e:
+ error_msg = f"Unexpected error during PaddleX OCR inference: {str(e)}"
+ stack_trace = traceback.format_exc()
+ log.error(f"{error_msg}\n{stack_trace}")
+ raise HTTPException(500, f"{error_msg}\nStack trace:\n{stack_trace}")
+
+
async def run_inference(
payload: Image | str, entries: InferenceEntries
) -> InferenceResponse:
- Update your Immich ML Docker Compose configuration to mount the modified
main.pyfile.
Check the immich_ml service in the provided docker-compose.yml
immich-machine-learning:
container_name: immich_machine_learning
# For hardware acceleration, add one of -[armnn, cuda, rocm, openvino, rknn] to the image tag.
# Example tag: ${IMMICH_VERSION:-release}-cuda
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
volumes:
- ./model-cache:/cache
- ./immich_ml_main.py:/usr/src/immich_ml/main.py
ports:
- "33003:3003" # modify to your desired port mapping
environment:
- OCR_PADDLEX_URL=http://paddle_ocr:8080/ocr # Pointing to the PaddleX OCR service
restart: always
healthcheck:
disable: false
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities:
- gpu
Start or restart the Immich ML container to apply the changes:
docker-compose up -d immich-machine-learning
And that’s it! You have successfully set up PaddleX for fast OCR inference on GPU for Immich ML. You should now see a significant improvement in OCR inference times when processing images in your Immich instance.
