Skip to content

Add NLLB 600M translation-scoring web prototype with interactive span highlighting#1074

Draft
Copilot wants to merge 5 commits into
masterfrom
copilot/add-evaluation-for-encoder-decoder-models
Draft

Add NLLB 600M translation-scoring web prototype with interactive span highlighting#1074
Copilot wants to merge 5 commits into
masterfrom
copilot/add-evaluation-for-encoder-decoder-models

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 20, 2026

This PR adds a prototype browser app for interactive translation scoring: users enter source/target text side-by-side, choose language codes from the NLLB-200 set, and get low-probability spans highlighted with click-to-view replacement suggestions. Scoring is re-run on each text/language change using the NLLB 600M model on the server.

  • Server: scoring API + NLLB 600M integration

    • Added silnlp/nmt/translation_scorer_webapp.py with a lightweight HTTP server.
    • Uses facebook/nllb-200-distilled-600M via transformers (AutoTokenizer + AutoModelForSeq2SeqLM).
    • Exposes:
      • GET / for the prototype UI
      • GET /api/languages for supported language codes
      • POST /api/score returning flagged spans and ranked suggestions
    • Configures tokenizer/model language settings (src_lang, tgt_lang, forced_bos_token_id) per request.
  • UI: side-by-side editing + reactive scoring

    • Added inlined UI for source/target text areas and source/target language dropdowns populated from NLLB language tags.
    • Target text rendering includes inline highlights for low-probability spans.
    • Clicking a highlighted span reveals candidate phrase replacements and improvement deltas.
    • Debounced rescoring on source text, target text, or language selection changes.
  • Flag shaping for display

    • Added helper logic to convert word-index spans into character spans for precise UI highlighting.
    • Added overlap resolution to prefer broader phrase spans over nested word-level spans where they collide.
  • CLI entry point

    • Registered silnlp-nmt-score-webapp in pyproject.toml for direct launch of the prototype server.
# server-side scoring flow (simplified)
scorer = TranslationScorer(model, tokenizer, low_prob_threshold=-3.0, top_k_suggestions=5)
scored = scorer.score(source, translation)
return {"translation": scored.translation, "flags": _format_flags(scored)}
Original prompt

I'd like to create some totally new functionality in this project. This project trains and tests encoder-decoder models for translation. For the new functionality, I'd like to be able to give the software an encoder-decoder model, along with a source text and a translation, and have it evaluate the probability that the model assigns to each target token. For example, if the source text is "blue house" and the translation is "bleu maison" (a mis-translation), the code should first evaluate the model's conditional probability of "bleu" as the first token given the source sentence, and then it should force "bleu" as the first word of the translation, and evaluate the model's conditional probability of "maison" given both "bleu" and the source sentence. It should continue in this manner evaluating the conditional probability of the next token given the previous observed tokens in the translation. Once it has finished calculating these probabilities for each token in the translation, it should try to find low-probability words and phrases in the translation, pairing each of them with suggested translations from the model.

Created from VS Code.


This change is Reviewable

Copilot AI and others added 2 commits May 20, 2026 19:08
…corer.py, NMTModel.score_translation()

Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/c9d942dc-1d4a-4f20-b396-c522dcb080bc

Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>
…ripping

Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/c9d942dc-1d4a-4f20-b396-c522dcb080bc

Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>
Copilot AI changed the title [WIP] Add evaluation functionality for encoder-decoder models Add translation scoring: forced-decoding token probabilities with low-probability word detection and suggestions May 20, 2026
Copilot AI requested a review from benjaminking May 20, 2026 19:10
Copilot AI changed the title Add translation scoring: forced-decoding token probabilities with low-probability word detection and suggestions Add contextual phrase scoring and rescored replacement suggestions for NMT translations May 27, 2026
Copilot AI changed the title Add contextual phrase scoring and rescored replacement suggestions for NMT translations Add NLLB 600M translation-scoring web prototype with interactive span highlighting May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants