A drop-in EasyNMT replacement that runs locally, survives long-running workloads, and won't OOM your server when someone pastes a book into it.
Not a hosted SaaS, not a black box API. This project optimises for predictability, bounded memory, and failure modes you can reason about.
Why this exists (the practical bits):
- On-demand model loading + LRU eviction to avoid OOM and keep long-running services stable
- Backpressure + request queueing with smart
Retry-Afterso clients can behave nicely under load - Markdown-safe translation pipeline (sanitization + symbol masking) to prevent parser depth/bracket failures
EasyNMT-compatible endpoints: /translate (GET/POST), /lang_pairs, /language_detection, /model_name
Swagger: /docs · ReDoc: /redoc · Complete Guide
Just want a single binary? Skip Docker — see Standalone Executable below.
docker run -p 8000:8000 \
-v ./model-cache:/models \
-e MODEL_CACHE_DIR=/models \
scottgal/mostlylucid-nmt:cpudocker run --gpus all -p 8000:8000 \
-v ./model-cache:/models \
-e MODEL_CACHE_DIR=/models \
-e USE_GPU=true \
-e EASYNMT_MODEL_ARGS='{"torch_dtype":"fp16"}' \
scottgal/mostlylucid-nmt:gpuReality check: 7B models require 16GB+ VRAM and are 50-100x slower than opus-mt. They are not a free upgrade.
# HY-MT: Best quality for major languages (en, zh, de, fr, es, ja, ko, etc.)
docker run --gpus all -p 8000:8000 \
-v ./model-cache:/models \
-e MODEL_CACHE_DIR=/models \
-e USE_GPU=true \
-e MODEL_FAMILY=hymt \
-e HYMT_MODEL_SIZE=7B \
scottgal/mostlylucid-nmt:gpu
# MADLAD: 400+ languages including rare/low-resource
docker run --gpus all -p 8000:8000 \
-v ./model-cache:/models \
-e MODEL_CACHE_DIR=/models \
-e USE_GPU=true \
-e MODEL_FAMILY=madlad \
-e MADLAD_MODEL_SIZE=7b \
scottgal/mostlylucid-nmt:gpuImportant (GPU):
USE_GPU=truedoes nothing unless you run with--gpus all. Important (cache): Without-v ./model-cache:/models+MODEL_CACHE_DIR=/models, models re-download on every restart.
curl -X POST http://localhost:8000/translate \
-H 'Content-Type: application/json' \
-d '{"text":["Hello world"],"source_lang":"en","target_lang":"de","beam_size":1}'Quick chooser:
| If you want… | Use |
|---|---|
| Fast, cheap, production | opus-mt (default) |
| One model, many languages | m2m100 |
| Best quality major languages | hymt |
| Rare / low-resource languages | madlad |
Set via MODEL_FAMILY environment variable or per-request with "model_family": "...":
Fast (CPU-friendly, recommended):
- opus-mt (default): Best quality/speed, 1200+ directional pairs
- mbart50: Single model, 50 languages
- m2m100: Single model, 100 languages
High quality (GPU recommended):
- hymt: LLM-based (Tencent Hunyuan), best quality for major language pairs
- madlad: T5-based (Google), 400+ languages including low-resource
# Per-request override
curl -X POST http://localhost:8000/translate \
-H 'Content-Type: application/json' \
-d '{"text":["Hello"],"source_lang":"en","target_lang":"zh","model_family":"hymt"}'Auto-fallback (enabled by default): If a pair isn't available in your chosen family, it tries the fallback order automatically.
- CTranslate2: Smaller footprint, faster CPU inference (standalone exe + CPU images)
- Transformers/PyTorch: Best CUDA support (GPU images)
Auto-selected based on what's installed. You normally don't need to configure this.
CLI Commands & Options
| Command | Description | Example |
|---|---|---|
translate |
Translate text directly | translate "Hello" --to de |
server |
Start HTTP API server | server --port 8000 |
mcp |
MCP server for LLM integration | mcp |
languages |
List supported languages | languages --json |
status |
Check server health & cache | status |
info |
Show configuration | info |
# Global options
--version, -v Show version
--json, -j JSON output format
--host Server host (default: 127.0.0.1)
--port, -p Server port (default: 8000)
# translate options
--to, -t Target language (required)
--source, -s Source language (default: en, or auto-detect)
--model, -m Model family: opus-mt, mbart50, m2m100
# server options
--workers, -w Number of workers (default: 1)
--reload Auto-reload for development
--background, -b Run as daemonSmart Server Detection: If server is running, CLI uses HTTP API (fast). Otherwise loads model directly (slower first time).
MCP Integration (Claude Code / LLMs)
// ~/.claude/settings.json
{
"mcpServers": {
"translate": {
"command": "mostlylucid-nmt",
"args": ["mcp"]
}
}
}Available tools: translate, detect_language, list_languages
Model Family Details & Performance
| Family | Languages | Pairs | Model Type | Size | Speed |
|---|---|---|---|---|---|
| opus-mt | 150+ | 1200+ | Separate per direction | ~300MB each | Fast |
| mbart50 | 50 | 2,450 | Single multilingual | ~2.4GB | Fast |
| m2m100 | 100 | 9,900 | Single multilingual | ~2.2GB | Fast |
| Family | Languages | Model Type | Size | Speed | Notes |
|---|---|---|---|---|---|
| hymt | 34 | LLM (1.8B/7B) | 3.5-14GB | Slow | Best quality for major pairs |
| madlad | 400+ | T5 (3B/7B/10B) | 6-20GB | Slow | Best for rare languages |
Translating ~385 chars (README excerpt) English→German:
| Model | Time | Speed | Relative |
|---|---|---|---|
| opus-mt | 0.93s | 412 chars/sec | 1.0x baseline |
| m2m100 | 5.02s | 76 chars/sec | 5.4x slower |
| hymt | 44.35s | 9 chars/sec | 48x slower |
| madlad | 105.53s | 4 chars/sec | 114x slower |
Tested on AMD Ryzen 9 9950X / NVIDIA RTX A4000 16GB, CPU inference. GPU is significantly faster for hymt/madlad.
HY-MT:
HYMT_MODEL_SIZE=1.8B # 1.8B (default) or 7B
HYMT_TOP_K=20 # Top-k sampling
HYMT_TOP_P=0.6 # Nucleus sampling
HYMT_TEMPERATURE=0.7 # Generation temperatureMADLAD:
MADLAD_MODEL_SIZE=3b # 3b (default), 7b, or 10b
MADLAD_MAX_LENGTH=256 # Max output lengthAPI Endpoints
Full API documentation at /docs (Swagger UI).
POST /translate:
curl -X POST http://localhost:8000/translate \
-H 'Content-Type: application/json' \
-d '{"text": ["Hello world"], "target_lang": "de", "source_lang": "en", "beam_size": 1}'Other endpoints:
GET /lang_pairs- All supported language pairsPOST /language_detection- Detect language of input textGET /healthz- Health checkGET /readyz- Readiness checkGET /cache- Model cache statusGET /discover/opus-mt|mbart50|m2m100|all- Available pairs
Configuration Reference
USE_GPU=auto # true|false|auto
DEVICE=auto # cpu|cuda|cuda:0
MODEL_FAMILY=opus-mt # opus-mt|mbart50|m2m100|hymt|madlad
AUTO_MODEL_FALLBACK=1 # Try other families if pair unavailable
MODEL_CACHE_DIR=/models # Cache directory for volume mapping
MAX_CACHED_MODELS=10 # LRU cache capacityEASYNMT_BATCH_SIZE=16 # 16 CPU, 64 GPU
EASYNMT_MODEL_ARGS='{"torch_dtype":"fp16"}' # FP16 for GPU
PRELOAD_MODELS="en->de,de->en"
MAX_INFLIGHT_TRANSLATIONS=1 # auto: 1 GPU, 4 CPUENABLE_MEMORY_MONITOR=1 # Auto-evict on high memory
MEMORY_CRITICAL_THRESHOLD=90.0
GPU_MEMORY_CRITICAL_THRESHOLD=90.0CHUNK_CACHE_ENABLED=1
CHUNK_CACHE_CAPACITY=10000 # ~25MB at capacity
CHUNK_CACHE_MAX_AGE=3600 # TTL in secondsENABLE_QUEUE=1
MAX_QUEUE_SIZE=1000
TIMEOUT=120LOG_LEVEL=INFO # DEBUG for verbose
LOG_TO_FILE=1 # Enable file logging
LOG_FORMAT=json # For log aggregationPrevents parser depth errors in translated markdown:
MARKDOWN_SANITIZE=1
MARKDOWN_SAFE_MODE_AUTO=1 # Auto-enable for RTL targetsTroubleshooting
| Issue | Solution |
|---|---|
| GPU not used | Add --gpus all flag to docker run |
| Models re-download | Add -v ./model-cache:/models -e MODEL_CACHE_DIR=/models |
| Out of memory | Use FP16: -e EASYNMT_MODEL_ARGS='{"torch_dtype":"fp16"}' |
| 429 errors | Increase MAX_QUEUE_SIZE or check Retry-After header |
| Missing pair | Enable auto-fallback (default ON) or try different model family |
Building from Source
Standalone executable:
pip install pyinstaller
pyinstaller mostlylucid_nmt.spec --cleanDocker:
docker build -f Dockerfile.min -t mostlylucid-nmt:cpu .
docker build -f Dockerfile.gpu.min -t mostlylucid-nmt:gpu .See BUILD.md for full instructions.
MIT
For strict EasyNMT response shapes, use /compat endpoints:
# GET (EasyNMT format)
curl "http://localhost:8000/compat/translate?target_lang=de&text=Hello&source_lang=en"
# => { "translations": ["Hallo"] }
# POST (EasyNMT format)
curl -X POST http://localhost:8000/compat/translate \
-H 'Content-Type: application/json' \
-d '{"text": ["Hello"], "target_lang": "de", "source_lang": "en"}'
# => { "target_lang": "de", "source_lang": "en", "translated": ["Hallo"], "translation_time": 0.1 }Standard /translate endpoints include optional extras (pivot_path, metadata, etc.).