fix(#21): Enforce strict parsing of LLM method output#47
Closed
AK11105 wants to merge 4 commits into
Closed
Conversation
- Split on ^def boundaries using MULTILINE flag instead of lookahead - Avoids conflating methods when LLM wraps output in class body - Handles missing blank lines between methods - Properly isolates trailing text after predict() - Validates both load and predict are found, raising clear error if not This fixes: 1. Class-wrapped output where both methods get captured together 2. Missing blank line handling (no blank line = no \ndef boundary) 3. Trailing text after predict being included in the method body Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>"
Add tests for the three main issues the fix addresses: - Class-wrapped output (LLM wraps methods in class despite prompt) - Missing blank line between methods - Trailing text after predict method These tests verify that the new explicit boundary splitting approach correctly handles all edge cases without breaking existing functionality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>"
Split on explicit def boundaries and reject non-compliant output. Prevents conflating methods when LLM violates system prompt (e.g., class wrapper, missing blank lines, trailing text). Raises clear error to trigger LLM retry instead of silent failures. - Split on ^def (start-of-line only, rejects indentation) - Reject trailing content after predict - Validate exactly two methods with correct signatures - Error messages direct to system prompt requirements
Replace lenient edge-case tests with strict validation tests: - test_parse_methods_class_wrapped_raises: Rejects indented methods (class wrapper) - test_parse_methods_trailing_text_raises: Rejects trailing content Keep test_parse_methods_no_blank_line_between: Valid format with missing blank line. Enforces system prompt contract: "Return ONLY the two method bodies as plain Python"
Owner
Author
|
Copilot reeks |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes issue #21:
_parse_methodsconflates load() and predict() into one block.Problem
The original implementation used lookahead regex (
(?=\ndef |\Z)) to split methods from LLM output. This failed in three scenarios:load_matchandpredict_matchcapture the entire class body, causing both methods to be identical.\ndefdoesn't match properly.predict()with no subsequentdefgets included inpredict_body.The bug manifests at runtime:
self._modelis never set in the_GeneratedModelclass, raisingAttributeErroron first prediction.Solution: Strict Parsing
Replace lookahead regex with explicit
defboundary splitting AND enforce the system prompt contract:This approach:
def— Rejects indented methods (class wrappers)fix()self._modelWhy Strict?
If we accept edge cases (class wrappers, trailing text), the LLM learns it can ignore the system prompt. Strict parsing enforces the contract, preventing more bugs downstream.
Testing
test_parse_methods_no_blank_line_between✅ Accepts (valid: missing blank line)test_parse_methods_class_wrapped_raises❌ Rejects (violates: no class wrapper)test_parse_methods_trailing_text_raises❌ Rejects (violates: ONLY two methods)All existing tests continue to pass.