-
-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Problem
PR #2192 (#2170) added abbreviation-aware sentence splitting to EvidenceExtractor._split_sentences() with a _ABBREV_RE pattern that merges fragments after common abbreviations (Dr., U.S.A., etc.).
However, no tests were added for the new abbreviation handling. The existing TestSplitSentences class in evidence_extractor_test.py only tests basic delimiters (period, question mark, exclamation mark) and doesn't verify:
- "Dr. Smith diagnosed the issue." stays as one sentence
- "The U.S.A. deployed Redis." stays as one sentence
- Mixed abbreviations with real sentence breaks
Discovered During
Post-implementation review of #2170 fix.
Suggested Fix
Add 2-3 test cases to TestSplitSentences in evidence_extractor_test.py:
def test_abbreviation_not_split(self):
parts = extractor._split_sentences("Dr. Smith said hello. Then left.")
assert len(parts) == 2
assert parts[0] == "Dr. Smith said hello."Impact
Low — the fix works correctly but lacks regression test coverage.
Reactions are currently unavailable