mostlylucid-nmt (EasyNMT-compatible API)

A drop-in EasyNMT replacement that runs locally, survives long-running workloads, and won't OOM your server when someone pastes a book into it.

Not a hosted SaaS, not a black box API. This project optimises for predictability, bounded memory, and failure modes you can reason about.

Why this exists (the practical bits):

On-demand model loading + LRU eviction to avoid OOM and keep long-running services stable
Backpressure + request queueing with smart Retry-After so clients can behave nicely under load
Markdown-safe translation pipeline (sanitization + symbol masking) to prevent parser depth/bracket failures

EasyNMT-compatible endpoints: /translate (GET/POST), /lang_pairs, /language_detection, /model_name Swagger: /docs · ReDoc: /redoc · Complete Guide

Quick Start

Just want a single binary? Skip Docker — see Standalone Executable below.

CPU (persistent cache) — most users start here

docker run -p 8000:8000 \
  -v ./model-cache:/models \
  -e MODEL_CACHE_DIR=/models \
  scottgal/mostlylucid-nmt:cpu

GPU (persistent cache + FP16)

docker run --gpus all -p 8000:8000 \
  -v ./model-cache:/models \
  -e MODEL_CACHE_DIR=/models \
  -e USE_GPU=true \
  -e EASYNMT_MODEL_ARGS='{"torch_dtype":"fp16"}' \
  scottgal/mostlylucid-nmt:gpu

Highest Quality (LLM-based, GPU required, significantly slower)

Reality check: 7B models require 16GB+ VRAM and are 50-100x slower than opus-mt. They are not a free upgrade.

# HY-MT: Best quality for major languages (en, zh, de, fr, es, ja, ko, etc.)
docker run --gpus all -p 8000:8000 \
  -v ./model-cache:/models \
  -e MODEL_CACHE_DIR=/models \
  -e USE_GPU=true \
  -e MODEL_FAMILY=hymt \
  -e HYMT_MODEL_SIZE=7B \
  scottgal/mostlylucid-nmt:gpu

# MADLAD: 400+ languages including rare/low-resource
docker run --gpus all -p 8000:8000 \
  -v ./model-cache:/models \
  -e MODEL_CACHE_DIR=/models \
  -e USE_GPU=true \
  -e MODEL_FAMILY=madlad \
  -e MADLAD_MODEL_SIZE=7b \
  scottgal/mostlylucid-nmt:gpu

Important (GPU): USE_GPU=true does nothing unless you run with --gpus all. Important (cache): Without -v ./model-cache:/models + MODEL_CACHE_DIR=/models, models re-download on every restart.

Try the API

curl -X POST http://localhost:8000/translate \
  -H 'Content-Type: application/json' \
  -d '{"text":["Hello world"],"source_lang":"en","target_lang":"de","beam_size":1}'

Model Families

Quick chooser:

If you want…	Use
Fast, cheap, production	opus-mt (default)
One model, many languages	m2m100
Best quality major languages	hymt
Rare / low-resource languages	madlad

Set via MODEL_FAMILY environment variable or per-request with "model_family": "...":

Fast (CPU-friendly, recommended):

opus-mt (default): Best quality/speed, 1200+ directional pairs
mbart50: Single model, 50 languages
m2m100: Single model, 100 languages

High quality (GPU recommended):

hymt: LLM-based (Tencent Hunyuan), best quality for major language pairs
madlad: T5-based (Google), 400+ languages including low-resource

# Per-request override
curl -X POST http://localhost:8000/translate \
  -H 'Content-Type: application/json' \
  -d '{"text":["Hello"],"source_lang":"en","target_lang":"zh","model_family":"hymt"}'

Auto-fallback (enabled by default): If a pair isn't available in your chosen family, it tries the fallback order automatically.

Backends

CTranslate2: Smaller footprint, faster CPU inference (standalone exe + CPU images)
Transformers/PyTorch: Best CUDA support (GPU images)

Auto-selected based on what's installed. You normally don't need to configure this.

Reference

CLI Commands & Options

Command	Description	Example
`translate`	Translate text directly	`translate "Hello" --to de`
`server`	Start HTTP API server	`server --port 8000`
`mcp`	MCP server for LLM integration	`mcp`
`languages`	List supported languages	`languages --json`
`status`	Check server health & cache	`status`
`info`	Show configuration	`info`

# Global options
--version, -v         Show version
--json, -j            JSON output format
--host                Server host (default: 127.0.0.1)
--port, -p            Server port (default: 8000)

# translate options
--to, -t              Target language (required)
--source, -s          Source language (default: en, or auto-detect)
--model, -m           Model family: opus-mt, mbart50, m2m100

# server options
--workers, -w         Number of workers (default: 1)
--reload              Auto-reload for development
--background, -b      Run as daemon

Smart Server Detection: If server is running, CLI uses HTTP API (fast). Otherwise loads model directly (slower first time).

MCP Integration (Claude Code / LLMs)

// ~/.claude/settings.json
{
  "mcpServers": {
    "translate": {
      "command": "mostlylucid-nmt",
      "args": ["mcp"]
    }
  }
}

Available tools: translate, detect_language, list_languages

Model Family Details & Performance

Fast Models (CPU-friendly)

Family	Languages	Pairs	Model Type	Size	Speed
opus-mt	150+	1200+	Separate per direction	~300MB each	Fast
mbart50	50	2,450	Single multilingual	~2.4GB	Fast
m2m100	100	9,900	Single multilingual	~2.2GB	Fast

High-Quality Models (GPU recommended)

Family	Languages	Model Type	Size	Speed	Notes
hymt	34	LLM (1.8B/7B)	3.5-14GB	Slow	Best quality for major pairs
madlad	400+	T5 (3B/7B/10B)	6-20GB	Slow	Best for rare languages

Performance Benchmark

Translating ~385 chars (README excerpt) English→German:

Model	Time	Speed	Relative
opus-mt	0.93s	412 chars/sec	1.0x baseline
m2m100	5.02s	76 chars/sec	5.4x slower
hymt	44.35s	9 chars/sec	48x slower
madlad	105.53s	4 chars/sec	114x slower

Tested on AMD Ryzen 9 9950X / NVIDIA RTX A4000 16GB, CPU inference. GPU is significantly faster for hymt/madlad.

Configuration

HY-MT:

HYMT_MODEL_SIZE=1.8B      # 1.8B (default) or 7B
HYMT_TOP_K=20             # Top-k sampling
HYMT_TOP_P=0.6            # Nucleus sampling
HYMT_TEMPERATURE=0.7      # Generation temperature

MADLAD:

MADLAD_MODEL_SIZE=3b      # 3b (default), 7b, or 10b
MADLAD_MAX_LENGTH=256     # Max output length

API Endpoints

Full API documentation at /docs (Swagger UI).

POST /translate:

curl -X POST http://localhost:8000/translate \
  -H 'Content-Type: application/json' \
  -d '{"text": ["Hello world"], "target_lang": "de", "source_lang": "en", "beam_size": 1}'

Other endpoints:

GET /lang_pairs - All supported language pairs
POST /language_detection - Detect language of input text
GET /healthz - Health check
GET /readyz - Readiness check
GET /cache - Model cache status
GET /discover/opus-mt|mbart50|m2m100|all - Available pairs

Configuration Reference

Device & Model

USE_GPU=auto                # true|false|auto
DEVICE=auto                 # cpu|cuda|cuda:0
MODEL_FAMILY=opus-mt        # opus-mt|mbart50|m2m100|hymt|madlad
AUTO_MODEL_FALLBACK=1       # Try other families if pair unavailable
MODEL_CACHE_DIR=/models     # Cache directory for volume mapping
MAX_CACHED_MODELS=10        # LRU cache capacity

Performance

EASYNMT_BATCH_SIZE=16       # 16 CPU, 64 GPU
EASYNMT_MODEL_ARGS='{"torch_dtype":"fp16"}'  # FP16 for GPU
PRELOAD_MODELS="en->de,de->en"
MAX_INFLIGHT_TRANSLATIONS=1  # auto: 1 GPU, 4 CPU

Memory Management

ENABLE_MEMORY_MONITOR=1     # Auto-evict on high memory
MEMORY_CRITICAL_THRESHOLD=90.0
GPU_MEMORY_CRITICAL_THRESHOLD=90.0

Chunk Cache (LFU)

CHUNK_CACHE_ENABLED=1
CHUNK_CACHE_CAPACITY=10000  # ~25MB at capacity
CHUNK_CACHE_MAX_AGE=3600    # TTL in seconds

Queueing & Timeouts

ENABLE_QUEUE=1
MAX_QUEUE_SIZE=1000
TIMEOUT=120

Logging

LOG_LEVEL=INFO        # DEBUG for verbose
LOG_TO_FILE=1         # Enable file logging
LOG_FORMAT=json       # For log aggregation

Markdown Sanitization

Prevents parser depth errors in translated markdown:

MARKDOWN_SANITIZE=1
MARKDOWN_SAFE_MODE_AUTO=1  # Auto-enable for RTL targets

Troubleshooting

Issue	Solution
GPU not used	Add `--gpus all` flag to docker run
Models re-download	Add `-v ./model-cache:/models -e MODEL_CACHE_DIR=/models`
Out of memory	Use FP16: `-e EASYNMT_MODEL_ARGS='{"torch_dtype":"fp16"}'`
429 errors	Increase `MAX_QUEUE_SIZE` or check `Retry-After` header
Missing pair	Enable auto-fallback (default ON) or try different model family

Building from Source

Standalone executable:

pip install pyinstaller
pyinstaller mostlylucid_nmt.spec --clean

Docker:

docker build -f Dockerfile.min -t mostlylucid-nmt:cpu .
docker build -f Dockerfile.gpu.min -t mostlylucid-nmt:gpu .

See BUILD.md for full instructions.

License

MIT

EasyNMT Compatibility

For strict EasyNMT response shapes, use /compat endpoints:

# GET (EasyNMT format)
curl "http://localhost:8000/compat/translate?target_lang=de&text=Hello&source_lang=en"
# => { "translations": ["Hallo"] }

# POST (EasyNMT format)
curl -X POST http://localhost:8000/compat/translate \
  -H 'Content-Type: application/json' \
  -d '{"text": ["Hello"], "target_lang": "de", "source_lang": "en"}'
# => { "target_lang": "de", "source_lang": "en", "translated": ["Hallo"], "translation_time": 0.1 }

Standard /translate endpoints include optional extras (pivot_path, metadata, etc.).

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.claude		.claude
.github/workflows		.github/workflows
.idea		.idea
hooks		hooks
public		public
src		src
tests		tests
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
BOM_FIX_SUMMARY.md		BOM_FIX_SUMMARY.md
BUILD.md		BUILD.md
CHANGELOG.md		CHANGELOG.md
CHANGES.md		CHANGES.md
CLAUDE.md		CLAUDE.md
DEPENDENCY_VERIFICATION.md		DEPENDENCY_VERIFICATION.md
DOCKER_HUB.md		DOCKER_HUB.md
DOCKER_HUB_FIX.md		DOCKER_HUB_FIX.md
Dockerfile		Dockerfile
Dockerfile.arm64		Dockerfile.arm64
Dockerfile.gpu		Dockerfile.gpu
Dockerfile.gpu.min		Dockerfile.gpu.min
Dockerfile.min		Dockerfile.min
EASYNMT_COMPATIBILITY.md		EASYNMT_COMPATIBILITY.md
IDLE_MODEL_EVICTION.md		IDLE_MODEL_EVICTION.md
IMPROVEMENTS.md		IMPROVEMENTS.md
MODEL_FAMILY_SELECTION.md		MODEL_FAMILY_SELECTION.md
RASPBERRY-PI-OPTIMIZATIONS.md		RASPBERRY-PI-OPTIMIZATIONS.md
README-RASPBERRY-PI.md		README-RASPBERRY-PI.md
README.md		README.md
SYMBOL_MASKING_FIX.md		SYMBOL_MASKING_FIX.md
VERSIONING.md		VERSIONING.md
api-tests.http		api-tests.http
app.py		app.py
app_old.py		app_old.py
build-all.ps1		build-all.ps1
build-all.sh		build-all.sh
build-arm64.bat		build-arm64.bat
build-arm64.sh		build-arm64.sh
build-dev.ps1		build-dev.ps1
build_exe.ps1		build_exe.ps1
build_exe.sh		build_exe.sh
cleanup.ps1		cleanup.ps1
cli.py		cli.py
debug_current_issue.md		debug_current_issue.md
debug_translation.py		debug_translation.py
docker-compose-arm64.yml		docker-compose-arm64.yml
docker-compose.cpu.yml		docker-compose.cpu.yml
docker-compose.gpu.yml		docker-compose.gpu.yml
fix-cuda-errors.ps1		fix-cuda-errors.ps1
fix-cuda-errors.sh		fix-cuda-errors.sh
k6-summary.json		k6-summary.json
mcp-config.json		mcp-config.json
mostlylucid_nmt.spec		mostlylucid_nmt.spec
push-arm64.bat		push-arm64.bat
push-arm64.sh		push-arm64.sh
pyi_rth_torch.py		pyi_rth_torch.py
pytest.ini		pytest.ini
requirements-prod-cpu.txt		requirements-prod-cpu.txt
requirements-prod.txt		requirements-prod.txt
requirements.txt		requirements.txt
run_server.py		run_server.py
test_api.py		test_api.py
test_api_comprehensive.py		test_api_comprehensive.py
test_api_quick.bat		test_api_quick.bat
test_api_quick.sh		test_api_quick.sh
test_download_progress.py		test_download_progress.py
test_markdown_masking.py		test_markdown_masking.py
test_model_load.py		test_model_load.py
validate-api.ps1		validate-api.ps1
validate-api.sh		validate-api.sh
verify_all_features.py		verify_all_features.py
verify_compatibility.py		verify_compatibility.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mostlylucid-nmt (EasyNMT-compatible API)

Quick Start

CPU (persistent cache) — most users start here

GPU (persistent cache + FP16)

Highest Quality (LLM-based, GPU required, significantly slower)

Try the API

Model Families

Backends

Reference

Fast Models (CPU-friendly)

High-Quality Models (GPU recommended)

Performance Benchmark

Configuration

Device & Model

Performance

Memory Management

Chunk Cache (LFU)

Queueing & Timeouts

Logging

Markdown Sanitization

License

EasyNMT Compatibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mostlylucid-nmt (EasyNMT-compatible API)

Quick Start

CPU (persistent cache) — most users start here

GPU (persistent cache + FP16)

Highest Quality (LLM-based, GPU required, significantly slower)

Try the API

Model Families

Backends

Reference

Fast Models (CPU-friendly)

High-Quality Models (GPU recommended)

Performance Benchmark

Configuration

Device & Model

Performance

Memory Management

Chunk Cache (LFU)

Queueing & Timeouts

Logging

Markdown Sanitization

License

EasyNMT Compatibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages