Single-container OCR service running on Alpine Linux. Exposes an HTTP API to extract text from uploaded images using Tesseract via pytesseract.
- Languages: English (
eng) and Malay (msa) - Default per-request language:
eng+msa
- Dockerfile – Alpine image with Python, Tesseract, and app
- docker-compose.yml – service definition
- requirements.txt – Python dependencies
- app/config.py – language config and validation
- app/ocr_service.py – OCR logic using pytesseract
- app/api.py – FastAPI app exposing
/healthand/ocr
From the project root:
docker compose up --build ocr-apiThe API will listen on:
http://localhost:8000
Health and configuration info.
Example:
curl http://localhost:8000/healthExpected JSON (shape):
{
"status": "ok",
"default_language": "eng+msa",
"supported_languages": ["eng", "eng+msa", "msa"],
"tesseract_languages": ["eng", "msa", "..."]
}Upload an image, get back extracted text.
- Request: multipart/form-data
- Field
file: image file (PNG/JPEG/etc.) - Optional field or query param
lang:eng,msa, oreng+msa(default)
- Field
Examples:
# Default languages (eng+msa)
curl -F "file=@/path/to/image.png" \
http://localhost:8000/ocr
# English only
curl -F "file=@/path/to/english.png" \
-F "lang=eng" \
http://localhost:8000/ocr
# Malay only
curl -F "file=@/path/to/malay.png" \
-F "lang=msa" \
http://localhost:8000/ocrResponse (shape):
{
"filename": "image.png",
"language": "eng+msa",
"text": "... OCR result ..."
}- The Dockerfile currently installs:
tesseract-ocrtesseract-ocr-data-engtesseract-ocr-data-msa
- If build fails because a package name is not found:
-
Start a temporary Alpine container:
docker run --rm -it python:3.11-alpine sh
-
Inside it, inspect available Tesseract packages:
apk update apk search 'tesseract-ocr*' -
Adjust the package names in the Dockerfile to match what your Alpine repo provides.
-
After the image builds successfully, verify inside a running container:
docker compose run --rm ocr-api sh
# inside the container
tesseract --version
tesseract --list-langsYou should see eng and msa in the language list.