Introduce output validation for LLM-generated responses to ensure data consistency#657
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughA lightweight output validation layer was added: a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant LLMGenerator as LLM Generator
participant Validator as Output Validator
participant API as API Response
Client->>LLMGenerator: request (get_*_llm)
LLMGenerator->>LLM: call LLM, receive raw text
LLMGenerator->>LLMGenerator: parse raw text into question objects
LLMGenerator->>Validator: validate parsed questions
Validator-->>LLMGenerator: validated question list
LLMGenerator->>API: return validated questions
API-->>Client: HTTP 200 with validated payload
Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/Generator/output_validator.py (2)
36-39: Consider validating individual option elements.The validator checks that
optionsis a non-empty list with at least 2 elements, but doesn't verify that each element is a valid string. This could allow malformed options like[None, 123]to pass through.🛡️ Optional: Stricter options validation
# Check options field options = q.get("options") if not options or not isinstance(options, list) or len(options) < 2: continue + + # Verify all options are non-empty strings + if not all(isinstance(opt, str) and opt.strip() for opt in options): + continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/output_validator.py` around lines 36 - 39, The options list validation in output_validator.py (where options = q.get("options")) must also ensure each element is a non-empty string: replace the current check with logic that verifies options is a list, filters/validates elements with isinstance(opt, str) and opt.strip() != "", and treat the question as invalid (continue) if fewer than two valid string options remain or any element is malformed; reference the variable names options and q.get("options") when making the change.
17-20: Minor: Docstring doesn't reflect the actual minimum option count.The code at line 38 requires
len(options) >= 2, but the docstring only mentions "non-empty list". Consider updating the docstring to document the actual constraint.📝 Suggested docstring update
Validation Rules: - question field must exist and be non-empty string - - options field must exist and be non-empty list + - options field must exist and be a list with at least 2 elements - correct_answer field must exist and be non-empty string🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/output_validator.py` around lines 17 - 20, The docstring block titled "Validation Rules" in output_validator.py currently says "options field must exist and be non-empty list" but the validation enforces len(options) >= 2 (see the check using len(options) >= 2); update that docstring to state that options must be a list with at least two entries (e.g., "options field must exist and be a list with minimum length 2") so the documented constraint matches the implemented check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/Generator/output_validator.py`:
- Around line 36-39: The options list validation in output_validator.py (where
options = q.get("options")) must also ensure each element is a non-empty string:
replace the current check with logic that verifies options is a list,
filters/validates elements with isinstance(opt, str) and opt.strip() != "", and
treat the question as invalid (continue) if fewer than two valid string options
remain or any element is malformed; reference the variable names options and
q.get("options") when making the change.
- Around line 17-20: The docstring block titled "Validation Rules" in
output_validator.py currently says "options field must exist and be non-empty
list" but the validation enforces len(options) >= 2 (see the check using
len(options) >= 2); update that docstring to state that options must be a list
with at least two entries (e.g., "options field must exist and be a list with
minimum length 2") so the documented constraint matches the implemented check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e8bb1564-41d6-4f3d-849b-6260d6255715
📒 Files selected for processing (2)
backend/Generator/llm_generator.pybackend/Generator/output_validator.py
Addressed Issues:
Fixes #656
Summary
This PR introduces a lightweight output validation layer for LLM-generated question responses to prevent malformed or incomplete data from being returned by the API.
Currently, LLM-based endpoints return parsed outputs without validating their structure or content. This can lead to silent failures where responses contain missing fields, empty values, or inconsistent formats.
What was implemented
backend/Generator/output_validator.pyvalidate_mcqvalidate_shortqvalidate_boolqllm_generator.py:How this improves the system
Scope
In Scope
Out of Scope
Screenshots/Recordings:
Not applicable (backend validation improvement)
Additional Notes:
Validation is implemented at the generator level to ensure correctness before responses are returned. The implementation is intentionally minimal to align with the existing system design and maintain backward compatibility.
AI Usage Disclosure:
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.
Check one of the checkboxes below:
I have used the following AI models and tools: ChatGPT (for guidance and review)
Checklist
Summary by CodeRabbit