Summary
Scenario 4 reached 10/10 only after adding stopword filtering + trailing-s stemming to the coverage computation. Removing either drops it to 7/10. That is a linguistic prior for English, not a language-agnostic threshold — but the code and docs do not currently say so.
What to do
- Minimum (honest): rename the function and document in-code that coverage is English-biased in its current form. Update README claim about "no embeddings, no linguistic priors" to reflect the stopword plus morphology normalization.
- Better: replace both normalizations with something language-agnostic (e.g., IDF-like weighting derived from the trie's own visit counts, or character-n-gram coverage). Must hold or improve Scenario 4 grade without a hardcoded English vocabulary.
Acceptance
- Either the docs match the code, or the code drops the English-specific rules without regressing Scenario 4.
- Non-English paraphrase test (see M2) does not silently depend on an English preprocessor.
Links
src/mcp/responder.rs — coverage + normalization.
docs/design/honest_agent/progress.md Phase B paraphrase entry.
tests/honest_agent_paraphrase.rs
Summary
Scenario 4 reached 10/10 only after adding stopword filtering + trailing-s stemming to the coverage computation. Removing either drops it to 7/10. That is a linguistic prior for English, not a language-agnostic threshold — but the code and docs do not currently say so.
What to do
Acceptance
Links
src/mcp/responder.rs— coverage + normalization.docs/design/honest_agent/progress.mdPhase B paraphrase entry.tests/honest_agent_paraphrase.rs