Skip to content

Testing: EvidenceExtractor abbreviation-aware splitting has no test coverage #2202

@mrveiss

Description

@mrveiss

Problem

PR #2192 (#2170) added abbreviation-aware sentence splitting to EvidenceExtractor._split_sentences() with a _ABBREV_RE pattern that merges fragments after common abbreviations (Dr., U.S.A., etc.).

However, no tests were added for the new abbreviation handling. The existing TestSplitSentences class in evidence_extractor_test.py only tests basic delimiters (period, question mark, exclamation mark) and doesn't verify:

  • "Dr. Smith diagnosed the issue." stays as one sentence
  • "The U.S.A. deployed Redis." stays as one sentence
  • Mixed abbreviations with real sentence breaks

Discovered During

Post-implementation review of #2170 fix.

Suggested Fix

Add 2-3 test cases to TestSplitSentences in evidence_extractor_test.py:

def test_abbreviation_not_split(self):
    parts = extractor._split_sentences("Dr. Smith said hello. Then left.")
    assert len(parts) == 2
    assert parts[0] == "Dr. Smith said hello."

Impact

Low — the fix works correctly but lacks regression test coverage.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions