Translation API Benchmark (FLORES + COMET)

A reproducible benchmark evaluating translation quality across 20 languages using the FLORES dataset and modern metrics, powered by the TranslatePlus API.

👉 Includes real-world evaluation of APIs like DeepL, Google Translate, and Azure.

Quick Start (Run in 30 seconds)

git clone https://github.com/translateplus/translate-api-benchmark.git
cd translate-api-benchmark

pip install -r requirements.txt
python benchmark.py

⚡ Try the API (copy-paste)

import requests

url = "https://api.translateplus.io/v2/translate"

headers = {
    "X-API-KEY": "your_api_key",
    "Content-Type": "application/json"
}

payload = {
    "text": "Hello world",
    "source": "en",
    "target": "fr"
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())

👉 Response:

{
  "translations": {
    "translation": "Bonjour le monde",
    "source": "en",
    "target": "fr"
  }
}

📊 Key Results

COMET scores up to 0.92 (near human-level)
Strong performance across European, Asian, and global languages
Stable latency: ~0.4–0.48s

👉 Full dataset: https://huggingface.co/datasets/meetsohail/translateplus-flores-benchmark

Benchmark Visualizations

BLEU Scores

COMET Scores

Latency

Why COMET > BLEU

BLEU measures word overlap
COMET measures meaning

👉 BLEU fails for:

Japanese
Chinese
Korean

👉 COMET provides more realistic evaluation

📁 Dataset

FLORES (Meta AI)
~500–997 samples per language
20 languages (English → target)

Structure:

data/results_eng_Latn_fra_Latn.csv
data/results_eng_Latn_deu_Latn.csv
...

Each file:

source, reference, hypothesis, latency

⚙️ Benchmark Pipeline

1. Load dataset

from datasets import load_dataset
dataset = load_dataset("facebook/flores", "eng_Latn")

2. Translate

def translate(text, target):
    # plug in any API (DeepL, Google, etc.)
    return translated_text

3. Evaluate (COMET)

from comet import download_model, load_from_checkpoint

model_path = download_model("Unbabel/wmt22-comet-da")
model = load_from_checkpoint(model_path)

model.predict(data)

📈 Example Results

Language	BLEU	COMET
French	50.0	0.89
German	40.4	0.89
Portuguese	48.3	0.90
Japanese	1.8	0.92

👉 BLEU is unreliable for some languages

🔧 Requirements

datasets
sacrebleu
unbabel-comet
pandas
requests

💡 Use Cases

Compare translation APIs
Evaluate multilingual systems
Build translation pipelines
Research in machine translation

🔗 Resources

📊 Dataset: https://huggingface.co/datasets/meetsohail/translateplus-flores-benchmark
📝 Blog: https://translateplus.io/blog/translation-api-benchmark
🌐 API: https://translateplus.io

🤝 Contributing

PRs welcome!

Ideas:

add more languages
add new APIs
improve evaluation

⭐ Support

If this helped you:

👉 Star the repo 👉 Share with others 👉 Contribute improvements

📜 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
data		data
examples		examples
README.md		README.md
benchmark.py		benchmark.py
requirements.txt		requirements.txt
summary.csv		summary.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Translation API Benchmark (FLORES + COMET)

Quick Start (Run in 30 seconds)

⚡ Try the API (copy-paste)

📊 Key Results

Benchmark Visualizations

BLEU Scores

COMET Scores

Latency

Why COMET > BLEU

📁 Dataset

⚙️ Benchmark Pipeline

1. Load dataset

2. Translate

3. Evaluate (COMET)

📈 Example Results

🔧 Requirements

💡 Use Cases

🔗 Resources

🤝 Contributing

⭐ Support

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Translation API Benchmark (FLORES + COMET)

Quick Start (Run in 30 seconds)

⚡ Try the API (copy-paste)

📊 Key Results

Benchmark Visualizations

BLEU Scores

COMET Scores

Latency

Why COMET > BLEU

📁 Dataset

⚙️ Benchmark Pipeline

1. Load dataset

2. Translate

3. Evaluate (COMET)

📈 Example Results

🔧 Requirements

💡 Use Cases

🔗 Resources

🤝 Contributing

⭐ Support

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages