Test/llm endpoint comprehensive tests#659
Conversation
…ening to server.py Problem: - No input validation on any API endpoint (empty text, huge payloads, invalid types) - Missing error handling causes 500 errors with stack traces leaked to clients - /getTranscript endpoint vulnerable to command injection via unsanitized video_id - /getTranscript has race condition when concurrent requests use glob on shared folder - File uploads have no size limits or extension validation - No structured logging (bare print statements) - Hardcoded credential paths with no env variable fallback Solution: - Add _validate_input_text() helper: checks for None, empty, type, and max length (50k chars) - Add _validate_max_questions() helper: clamps to safe range [1, 20] - Add _allowed_file() helper: whitelist txt/pdf/docx extensions - Validate video_id against strict regex pattern (alphanumeric, 11 chars) before subprocess - Add 30s timeout to yt-dlp subprocess call - Scope subtitle files to video_id to prevent race conditions between concurrent requests - Add Flask MAX_CONTENT_LENGTH (10 MB) with 413 error handler - Wrap all endpoints in try/except with user-friendly error messages (no stack traces) - Replace print() with Python logging module throughout - Use os.environ.get() for Google service account file path - Add input validation to generate_gform (qa_pairs and question_type) - Use request.get_json(silent=True) to handle malformed JSON gracefully Co-authored-by: Cursor <cursoragent@cursor.com>
- Fix potential NameError in /getTranscript by wrapping clean_transcript in try/except and removing unused expected_vtt variable - Fix input_questions (list) incorrectly passed as num_questions (int) in get_shortq_hard, get_mcq_hard, get_boolq_hard; fix type annotation in QuestionGenerator.generate() from bool to int - Add try/except to generate_gform, remove webbrowser.open_new_tab() server-side anti-pattern, return structured JSON response - Remove unused exception variable 'e' across all except blocks (Ruff F841) - Tighten glob pattern in getTranscript to use dot separator - Add strict=True to zip() in get_mcq_answer for defensive validation Co-authored-by: Cursor <cursoragent@cursor.com>
- Fix critical bug: predict_boolean_answer receives a string instead of a list, causing character-level iteration; now wraps question in [question] - Fix critical bug: answer_style="true_false" is not a valid option in QuestionGenerator; changed to "sentences" in get_boolq_hard - Fix critical bug: make_question_harder in get_boolq_hard receives full dict instead of question string; aligned with other hard-question endpoints - Fix major issue: num_questions silently ignored because use_evaluator defaults to False; added use_evaluator=True to all 3 hard-question endpoints - Fix minor issue: subtitles directory creation moved from __main__ guard to module level so it works under Gunicorn/WSGI deployments - Fix nitpick: num_questions type annotation updated to int | None per PEP 484 Co-authored-by: Cursor <cursoragent@cursor.com>
The length equality check above the loop already guarantees both lists are the same size, making strict=True redundant. Removing it avoids a runtime error on Python versions below 3.10. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the minimal test_server.py (which required a live server and used raw requests) with a proper pytest-based test suite that runs against Flask's built-in test client — no running server or GPU needed. 172 tests covering: - All 15 API endpoints with valid input - Invalid input (empty text, missing fields, wrong types) - Boundary-value validation for max_questions (clamping 1–20) - Edge cases (unicode, emoji, special characters, very long text) - File upload with accepted/rejected file types - HTTP method enforcement (405 on wrong methods) - Error handling (generator/pipeline exceptions → 500) Also adds: - backend/conftest.py: mocks all heavy ML dependencies (transformers, spacy, Generator, Google APIs) at the sys.modules level so the full suite runs in <1 second - .github/workflows/tests.yml: CI workflow that runs tests on every push/PR to main across Python 3.9, 3.10, and 3.11 Co-authored-by: Cursor <cursoragent@cursor.com>
Replace `int | None` with `Optional[int]` in Generator/main.py to ensure compatibility with Python 3.9 and satisfy CodeRabbit review. Co-authored-by: Cursor <cursoragent@cursor.com>
…e check Add one-line docstrings to all 16 route handlers and error handler in server.py, and all ~80 test methods in test_server.py to satisfy the 80% docstring coverage threshold flagged by CodeRabbit pre-merge checks. Co-authored-by: Cursor <cursoragent@cursor.com>
- Fix path traversal vulnerability in upload_file() by sanitizing filename with werkzeug.utils.secure_filename before processing - Fix credentials loading bug in generate_gform() — add store.get() so cached credentials are reused instead of re-running OAuth flow - Use logger.exception instead of logger.error in subprocess handlers for better diagnostics with tracebacks - Add safety comment explaining VIDEO_ID_PATTERN guards subtitle_path - Remove unused mock_mcq_gen fixture arg from test_max_questions_float - Add ClassVar annotations to mutable class attributes (Ruff compliance) - Add 6 new tests: transcript happy-path, timeout (504), subprocess error (500), no subtitles (404), get_content happy-path, and service-unavailable (503) — total now 178 tests Co-authored-by: Cursor <cursoragent@cursor.com>
Merged origin/main into test/backend-comprehensive-tests, resolving conflicts in backend/server.py and backend/test_server.py: - server.py: kept PR's import organization, input validation, and consistent logger usage; restored LLMQuestionGenerator import from upstream - test_server.py: kept comprehensive pytest-based test suite over old-style test functions
- CRITICAL: /get_boolq_hard now uses BoolQGen.generate_boolq() instead of qg.generate(answer_style='sentences') which was generating sentence-based questions instead of boolean questions - Add input validation (_validate_input_text, _validate_max_questions) to all LLM endpoints (get_shortq_llm, get_mcq_llm, get_boolq_llm, get_problems_llm) for consistency with other endpoints - Replace app.logger with configured logger in LLM endpoints - Add strict=True to zip() in get_mcq_answer for defense-in-depth - Fix en-dash to hyphen-minus in _validate_input_text docstring
…onGenerator parsing logic - Add LLM generator mock infrastructure to conftest.py (defaults, fixture, reset logic) - Add ~40 endpoint tests for /get_shortq_llm, /get_mcq_llm, /get_boolq_llm, /get_problems_llm - Create test_llm_generator.py with ~35 unit tests for JSON parsing, fallback parsing, boolean coercion - Add pytest.ini with filterwarnings to suppress third-party deprecation warnings - Include LLM endpoints in TestInputValidation and TestHTTPMethods parametrized suites - Fix pre-existing boolq_gen mock data structure (strings -> dicts) - Fix pre-existing test_generator_exception_returns_500[/get_boolq_hard] assertion All 316 tests pass with 0 warnings.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a backend CI workflow, extensive pytest fixtures and tests, corrects a generator type annotation, and hardens the Flask backend with input validation, safer JSON handling, structured error responses, file/YouTube safeguards, and logging. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Server as Flask Server
participant Gen as QuestionGenerator/LLM
participant Pipeline as QA Pipeline/Predictor
participant FS as FileProcessor / yt-dlp
participant Docs as Google Docs Service
Client->>Server: POST /generate {input_text, max_questions}
Server->>Server: parse JSON (silent), validate input_text, clamp max_questions
alt file or youtube flow
Server->>FS: download/process file or subtitles
FS-->>Server: extracted text or error
end
alt docs flow
Server->>Docs: fetch document (if initialized)
Docs-->>Server: content or 503 error
end
Server->>Gen: generate questions (may call LLM)
Gen-->>Server: questions or raise error
Server->>Pipeline: predict/score answers
Pipeline-->>Server: answers/confidence
Server-->>Client: 200 JSON {questions, answers}
alt any error
Server-->>Client: 4xx/5xx JSON error (logged)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/server.py (1)
503-518:⚠️ Potential issue | 🟠 MajorDon't duplicate the correct answer in MCQ form choices.
The MCQ payload already includes the correct answer inside
options(seebackend/conftest.py, Lines 23-37). Prependingqapair["answer"]again can render the correct choice twice and drop one distractor.Suggested fix
- choices = [qapair["answer"]] + valid_options[ - :3 - ] # Include up to the first 3 options + choices = list(dict.fromkeys([qapair["answer"], *valid_options]))[ + :4 + ] @@ - choices = [qapair["answer"]] + valid_options[ - :3 - ] # Include up to the first 3 options + choices = list(dict.fromkeys([qapair["answer"], *valid_options]))[ + :4 + ]Also applies to: 567-577
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 503 - 518, The code in the question_type "get_mcq" block prepends qapair["answer"] to valid_options which can duplicate the correct answer (since options already include it); fix by building choices from valid_options after filtering out empty entries and then ensure the answer is present exactly once: remove any occurrences of qapair["answer"] from valid_options, take up to three distractors from that list, assemble choices = [qapair["answer"]] + distractors (or if you prefer, add the answer only if it wasn't in options), then random.shuffle(choices); apply the same de-duplication and assembly logic to the similar code at the other occurrence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/conftest.py`:
- Around line 194-240: The fixture currently only restores side_effect and
return_value but leaves call history and child-mock state, so add a reset of
call history by invoking reset_mock() on each mocked object (e.g. _mock_mcq_gen,
_mock_boolq_gen, _mock_shortq_gen, _mock_answer_predictor,
_mock_question_generator, _mock_file_processor, _mock_qa_pipeline,
_mock_llm_generator) before reassigning side_effect/return_value; this clears
call_count, call_args, and child mocks so assert_called()/call_args checks won't
leak between tests while preserving the existing deepcopy-based return_value
restoration.
In `@backend/server.py`:
- Around line 454-462: Validate each element of qa_pairs before accessing keys:
ensure qa_pairs is a list of dicts and each dict has non-empty string fields
"question" and "answer"; if any item is invalid return a 400 JSON error
describing the bad item. Update the code paths that build requests from qa_pairs
(the block that later accesses qapair["question"] / qapair["answer"]) to perform
this validation up front (check isinstance(item, dict),
isinstance(item.get("question"), str) and item["question"].strip() and same for
"answer") and use a clear jsonify({...}), 400 response instead of allowing
KeyError/TypeError to propagate. Ensure the validation runs in both places where
qa_pairs are consumed (the initial endpoint handling and the later
request-construction path) so callers receive a structured JSON error.
- Around line 791-821: The current implementation uses subtitle_path
(subtitles/{video_id}) which still causes races for concurrent requests of the
same video; change to a per-request unique directory or filename (e.g., include
a short UUID or use tempfile.mkdtemp) so each request writes to and later cleans
up its own path; update the subprocess.run -o target and the subsequent
matching_files glob usage (references: subtitle_path, the subprocess.run call
that sets "-o", and the matching_files = glob.glob(...) line) to use the new
per-request path, and ensure any cleanup/exception handling that references
subtitle_path is updated to remove only that request's files.
- Around line 330-345: The handler currently only checks truthiness and matching
lengths for input_questions and input_options which allows non-list types (e.g.,
strings) to slip through; add a reusable validator (e.g.,
_validate_questions_and_options) that asserts both input_questions and
input_options are instances of list, that their lengths match and are non-empty,
and returns a standardized HTTP 400/json error when invalid; call this validator
from this route and from /get_shortq_answer, /get_boolean_answer and the
hard-question handlers, and keep the existing _validate_input_text call for
input_text validation.
- Around line 136-137: The current POST handler uses
request.get_json(silent=True) or {} which allows non-object JSON (strings,
numbers, arrays) to flow into data.get("input_text", "") and cause 500s; update
the handler that sets data and input_text to first call
request.get_json(silent=True) into a temp (e.g., json_body) and explicitly
reject non-dict payloads with a 400 (or use the existing project helper that
enforces object JSON), then set data = json_body and continue to read input_text
from data.get("input_text", ""); apply the same pattern to other POST routes
using request.get_json(silent=True) or {}.
In `@backend/test_server.py`:
- Around line 615-628: The parametrized test test_with_input_questions is
asserting the wrong collaborator for the /get_boolq_hard case—/get_boolq_hard
routes to BoolQGen.generate_boolq (see BoolQGen.generate_boolq) not
QuestionGenerator.generate (mock_question_generator.generate); update the test
to branch on the endpoint: when endpoint == "/get_boolq_hard" assert that the
BoolQGen mock's generate_boolq was called with num_questions == 2 (and
appropriate args), otherwise assert mock_question_generator.generate was called
with num_questions == 2, using the mocks for BoolQGen and QuestionGenerator
referenced in the test to check the correct call history.
- Around line 1146-1157: The test test_happy_path_returns_content uses a mock
that returns a dict but the real GoogleDocsService.get_document_content (in
Generator.main) returns a string, so update the
mock_docs.get_document_content.return_value to be the same string shape the real
service returns (e.g., "Document text here") and update assertions to verify the
endpoint's response body matches that expected string (use
mock_docs.get_document_content.assert_called_once() and assert resp.json() or
resp.get_data() equals the string as the /get_content contract requires).
---
Outside diff comments:
In `@backend/server.py`:
- Around line 503-518: The code in the question_type "get_mcq" block prepends
qapair["answer"] to valid_options which can duplicate the correct answer (since
options already include it); fix by building choices from valid_options after
filtering out empty entries and then ensure the answer is present exactly once:
remove any occurrences of qapair["answer"] from valid_options, take up to three
distractors from that list, assemble choices = [qapair["answer"]] + distractors
(or if you prefer, add the answer only if it wasn't in options), then
random.shuffle(choices); apply the same de-duplication and assembly logic to the
similar code at the other occurrence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6f618d39-e6c5-494a-9795-62d09916dca1
📒 Files selected for processing (8)
.github/workflows/tests.ymlbackend/Generator/main.pybackend/conftest.pybackend/pytest.inibackend/requirements-test.txtbackend/server.pybackend/test_llm_generator.pybackend/test_server.py
| data = request.get_json(silent=True) or {} | ||
| input_text = data.get("input_text", "") |
There was a problem hiding this comment.
Reject non-object JSON bodies before the first .get().
request.get_json(silent=True) or {} only normalizes falsy payloads. Truthy JSON scalars/arrays like "hello", 123, or ["x"] still reach data.get(...), so this route—and every POST handler using the same pattern below—can fall into generic 500 handling instead of returning a 400.
Suggested fix
+def _get_json_object():
+ data = request.get_json(silent=True)
+ if not isinstance(data, dict):
+ return None, (jsonify({"error": "Request body must be a JSON object"}), 400)
+ return data, NoneUse that helper instead of request.get_json(silent=True) or {} in each POST route.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 136 - 137, The current POST handler uses
request.get_json(silent=True) or {} which allows non-object JSON (strings,
numbers, arrays) to flow into data.get("input_text", "") and cause 500s; update
the handler that sets data and input_text to first call
request.get_json(silent=True) into a temp (e.g., json_body) and explicitly
reject non-dict payloads with a 400 (or use the existing project helper that
enforces object JSON), then set data = json_body and continue to read input_text
from data.get("input_text", ""); apply the same pattern to other POST routes
using request.get_json(silent=True) or {}.
| data = request.get_json(silent=True) or {} | ||
| input_text = data.get("input_text", "") | ||
| input_questions = data.get("input_question", []) | ||
| input_options = data.get("input_options", []) | ||
| outputs = [] | ||
|
|
||
| if not input_questions or not input_options or len(input_questions) != len(input_options): | ||
| return jsonify({"output": outputs}) | ||
| input_text, err = _validate_input_text(input_text) | ||
| if err: | ||
| return err | ||
|
|
||
| for question, options in zip(input_questions, input_options): | ||
| # Generate answer using the QA model | ||
| qa_response = qa_model(question=question, context=input_text) | ||
| generated_answer = qa_response["answer"] | ||
| if ( | ||
| not input_questions | ||
| or not input_options | ||
| or len(input_questions) != len(input_options) | ||
| ): | ||
| return jsonify({"output": []}) | ||
|
|
There was a problem hiding this comment.
Reject non-list input_question / input_options payloads.
This route only checks truthiness and length. A JSON string like "abc" is truthy and iterable, so it can be treated as three questions; other wrong shapes can also slip through to bogus 200s or late 500s. The same validator should be reused by /get_shortq_answer, /get_boolean_answer, and the hard-question routes.
Suggested validation
+ if not isinstance(input_questions, list) or not all(
+ isinstance(question, str) for question in input_questions
+ ):
+ return jsonify({"error": "input_question must be a list of strings"}), 400
+
+ if not isinstance(input_options, list) or not all(
+ isinstance(options, list) for options in input_options
+ ):
+ return jsonify({"error": "input_options must be a list of option lists"}), 400
+
if (
not input_questions
or not input_options
or len(input_questions) != len(input_options)
):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| data = request.get_json(silent=True) or {} | |
| input_text = data.get("input_text", "") | |
| input_questions = data.get("input_question", []) | |
| input_options = data.get("input_options", []) | |
| outputs = [] | |
| if not input_questions or not input_options or len(input_questions) != len(input_options): | |
| return jsonify({"output": outputs}) | |
| input_text, err = _validate_input_text(input_text) | |
| if err: | |
| return err | |
| for question, options in zip(input_questions, input_options): | |
| # Generate answer using the QA model | |
| qa_response = qa_model(question=question, context=input_text) | |
| generated_answer = qa_response["answer"] | |
| if ( | |
| not input_questions | |
| or not input_options | |
| or len(input_questions) != len(input_options) | |
| ): | |
| return jsonify({"output": []}) | |
| data = request.get_json(silent=True) or {} | |
| input_text = data.get("input_text", "") | |
| input_questions = data.get("input_question", []) | |
| input_options = data.get("input_options", []) | |
| input_text, err = _validate_input_text(input_text) | |
| if err: | |
| return err | |
| if not isinstance(input_questions, list) or not all( | |
| isinstance(question, str) for question in input_questions | |
| ): | |
| return jsonify({"error": "input_question must be a list of strings"}), 400 | |
| if not isinstance(input_options, list) or not all( | |
| isinstance(options, list) for options in input_options | |
| ): | |
| return jsonify({"error": "input_options must be a list of option lists"}), 400 | |
| if ( | |
| not input_questions | |
| or not input_options | |
| or len(input_questions) != len(input_options) | |
| ): | |
| return jsonify({"output": []}) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 330 - 345, The handler currently only checks
truthiness and matching lengths for input_questions and input_options which
allows non-list types (e.g., strings) to slip through; add a reusable validator
(e.g., _validate_questions_and_options) that asserts both input_questions and
input_options are instances of list, that their lengths match and are non-empty,
and returns a standardized HTTP 400/json error when invalid; call this validator
from this route and from /get_shortq_answer, /get_boolean_answer and the
hard-question handlers, and keep the existing _validate_input_text call for
input_text validation.
| data = request.get_json(silent=True) or {} | ||
| qa_pairs = data.get("qa_pairs", []) | ||
| question_type = data.get("question_type", "") | ||
|
|
||
| if not qa_pairs or not isinstance(qa_pairs, list): | ||
| return jsonify({"error": "qa_pairs must be a non-empty list"}), 400 | ||
|
|
||
| if not question_type or not isinstance(question_type, str): | ||
| return jsonify({"error": "question_type is required"}), 400 |
There was a problem hiding this comment.
Validate each qa_pairs item before indexing into it.
The endpoint only verifies that qa_pairs is a non-empty list. Items like {} or "bad" still reach qapair["question"] / qapair["answer"], and that request-construction path runs before Line 615, so callers get an unstructured 500 instead of a stable JSON error.
Also applies to: 486-612
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 454 - 462, Validate each element of qa_pairs
before accessing keys: ensure qa_pairs is a list of dicts and each dict has
non-empty string fields "question" and "answer"; if any item is invalid return a
400 JSON error describing the bad item. Update the code paths that build
requests from qa_pairs (the block that later accesses qapair["question"] /
qapair["answer"]) to perform this validation up front (check isinstance(item,
dict), isinstance(item.get("question"), str) and item["question"].strip() and
same for "answer") and use a clear jsonify({...}), 400 response instead of
allowing KeyError/TypeError to propagate. Ensure the validation runs in both
places where qa_pairs are consumed (the initial endpoint handling and the later
request-construction path) so callers receive a structured JSON error.
| # Use video_id-scoped file path to avoid race conditions between | ||
| # concurrent requests for different videos. | ||
| # Safety: subtitle_path is safe because VIDEO_ID_PATTERN (above) restricts | ||
| # video_id to [a-zA-Z0-9_-]{11}, preventing path traversal and injection. | ||
| subtitle_path = os.path.join("subtitles", f"{video_id}") | ||
|
|
||
| latest_subtitle = max(subtitle_files, key=os.path.getctime) | ||
| transcript_text = clean_transcript(latest_subtitle) | ||
| try: | ||
| subprocess.run( | ||
| [ | ||
| "yt-dlp", | ||
| "--write-auto-sub", | ||
| "--sub-lang", "en", | ||
| "--skip-download", | ||
| "--sub-format", "vtt", | ||
| "-o", f"{subtitle_path}.%(ext)s", | ||
| f"https://www.youtube.com/watch?v={video_id}", | ||
| ], | ||
| check=True, | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=30, | ||
| ) | ||
| except subprocess.TimeoutExpired: | ||
| logger.exception("yt-dlp timed out for video %s", video_id) | ||
| return jsonify({"error": "Transcript download timed out"}), 504 | ||
| except subprocess.CalledProcessError as e: | ||
| logger.exception("yt-dlp failed for video %s: %s", video_id, e.stderr) | ||
| return jsonify({"error": "Failed to download subtitles"}), 500 | ||
|
|
||
| # Look for the VTT file specific to this video_id | ||
| matching_files = glob.glob(f"subtitles/{video_id}.*.vtt") |
There was a problem hiding this comment.
Use a per-request subtitle path, not just a per-video path.
subtitles/<video_id> avoids collisions between different videos, but concurrent requests for the same video still share the same download and cleanup target. One request can delete the VTT before the other reads it, causing intermittent 404/500s on popular videos.
Also applies to: 831-837
🧰 Tools
🪛 Ruff (0.15.9)
[error] 798-798: subprocess call: check for execution of untrusted input
(S603)
[error] 799-807: Starting a process with a partial executable path
(S607)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 791 - 821, The current implementation uses
subtitle_path (subtitles/{video_id}) which still causes races for concurrent
requests of the same video; change to a per-request unique directory or filename
(e.g., include a short UUID or use tempfile.mkdtemp) so each request writes to
and later cleans up its own path; update the subprocess.run -o target and the
subsequent matching_files glob usage (references: subtitle_path, the
subprocess.run call that sets "-o", and the matching_files = glob.glob(...)
line) to use the new per-request path, and ensure any cleanup/exception handling
that references subtitle_path is updated to remove only that request's files.
- Add reset_mock() calls in conftest.py to clear call history between tests - Fix test_with_input_questions to assert correct mock for /get_boolq_hard All 316 tests pass with 0 warnings.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
backend/test_server.py (1)
1153-1164:⚠️ Potential issue | 🟠 MajorMake the happy-path
/get_contentmock match the real service contract.
get_document_content()returns raw document text, not{"content": ...}. With the current dict-shaped mock and status-only assertion, this test still won't catch a response-shape regression on the endpoint.Suggested test fix
with patch("server.docs_service") as mock_docs: - mock_docs.get_document_content.return_value = { - "content": "Document text here" - } + mock_docs.get_document_content.return_value = "Document text here" resp = client.post( "/get_content", json={"document_url": "https://docs.google.com/document/d/abc123"}, ) assert resp.status_code == 200 mock_docs.get_document_content.assert_called_once() + assert resp.get_json() == "Document text here"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/test_server.py` around lines 1153 - 1164, The test's mock incorrectly returns a dict but the real docs service get_document_content(document_url) returns a raw string; update test_happy_path_returns_content to have mock_docs.get_document_content.return_value = "Document text here" (raw string), then call client.post(...) as before and assert both resp.status_code == 200 and that the response body/JSON contains the expected shape/content (e.g., resp.json() or resp.get_data() matches the endpoint's contract, and optionally assert mock_docs.get_document_content.assert_called_once_with(...) to verify it was called with the document_url).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/conftest.py`:
- Around line 23-42: The fixtures in _DEFAULTS (mcq_gen and boolq_gen) do not
match the real generator contracts returned by Generator.main.py; update the
mcq_gen entry to include top-level keys "statement", "questions", and
"time_taken" (with "questions" containing the existing question objects) and
update the boolq_gen entry to include top-level keys "Text", "Count", and
"Boolean_Questions" (with "Boolean_Questions" containing the existing items and
"Count" reflecting the list length), so tests exercise the real response shape
used by functions that consume these fixtures.
- Around line 202-260: The teardown misses clearing side_effect and restoring
return_value for the two tracked mocks _mock_google_docs.get_document_content
and _mock_mediawiki_instance.summary, which lets mutated state leak between
tests; in the teardown add setting
_mock_google_docs.get_document_content.side_effect = None and
_mock_mediawiki_instance.summary.side_effect = None in the "Clear side effects"
block, and restore their return_value in the "Restore defaults" block using
copy.deepcopy(_DEFAULTS[...] ) if those keys exist or set return_value = None
otherwise, keeping the existing pattern used for _mock_llm_generator and others
and preserving the final reset_mock loop over _tracked_mocks.
In `@backend/test_server.py`:
- Around line 125-155: The tests for max_questions should assert the exact
normalized value produced by backend/server.py::_validate_max_questions instead
of loose range checks; update test_max_questions_clamped_high,
test_max_questions_clamped_low, test_max_questions_zero,
test_max_questions_non_integer_string (and analogous tests for /get_problems and
the LLM variants) to inspect
mock_mcq_gen.generate_mcq.call_args[0][0]["max_questions] and assert it equals
the precise clamped/default value (e.g., 20 for inputs >20, 1 for <=0, and the
configured default for non-integer strings) so the handler must pass the exact
normalized value to the collaborator.
---
Duplicate comments:
In `@backend/test_server.py`:
- Around line 1153-1164: The test's mock incorrectly returns a dict but the real
docs service get_document_content(document_url) returns a raw string; update
test_happy_path_returns_content to have
mock_docs.get_document_content.return_value = "Document text here" (raw string),
then call client.post(...) as before and assert both resp.status_code == 200 and
that the response body/JSON contains the expected shape/content (e.g.,
resp.json() or resp.get_data() matches the endpoint's contract, and optionally
assert mock_docs.get_document_content.assert_called_once_with(...) to verify it
was called with the document_url).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 155d0d76-f1f6-4236-a7c7-d09f66624617
📒 Files selected for processing (2)
backend/conftest.pybackend/test_server.py
- Set exact assertions for max_questions tests instead of loose thresholds - Fix /get_content mock to correctly return string instead of dict - Add missing tear_down keys for mocked third party instances - Align mock _DEFAULTS mcq_gen and boolq_gen structure to exact application structure
There was a problem hiding this comment.
♻️ Duplicate comments (1)
backend/conftest.py (1)
252-253:⚠️ Potential issue | 🟡 MinorMediawiki mock return_value not properly restored in teardown.
Line 177-179 sets
_mock_mediawiki_instance.summary.return_valueto a specific string, but teardown resets it toNone. After the first test completes, subsequent tests will havemediawiki.summary()returnNoneinstead of the expected default, breaking test isolation.Consider adding a
mediawikientry to_DEFAULTSand restoring it properly:🔧 Proposed fix
Add to
_DEFAULTS(around line 57):"mediawiki_summary": "Artificial intelligence is intelligence demonstrated by machines.",Update lines 177-179:
_mock_mediawiki_instance.summary.return_value = ( - "Artificial intelligence is intelligence demonstrated by machines." + _DEFAULTS["mediawiki_summary"] )Update line 253:
- _mock_mediawiki_instance.summary.return_value = None + _mock_mediawiki_instance.summary.return_value = _DEFAULTS["mediawiki_summary"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/conftest.py` around lines 252 - 253, The teardown is overwriting the MediaWiki mock's configured return value with None; add a "mediawiki_summary" key to _DEFAULTS (e.g., a default summary string) and use that key when setting and restoring _mock_mediawiki_instance.summary.return_value so tests get restored to the expected default. Update the setup that assigns _mock_mediawiki_instance.summary.return_value to use _DEFAULTS["mediawiki_summary"] and modify the teardown to restore _mock_mediawiki_instance.summary.return_value from _DEFAULTS instead of setting it to None, ensuring test isolation for _mock_mediawiki_instance.summary across tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@backend/conftest.py`:
- Around line 252-253: The teardown is overwriting the MediaWiki mock's
configured return value with None; add a "mediawiki_summary" key to _DEFAULTS
(e.g., a default summary string) and use that key when setting and restoring
_mock_mediawiki_instance.summary.return_value so tests get restored to the
expected default. Update the setup that assigns
_mock_mediawiki_instance.summary.return_value to use
_DEFAULTS["mediawiki_summary"] and modify the teardown to restore
_mock_mediawiki_instance.summary.return_value from _DEFAULTS instead of setting
it to None, ensuring test isolation for _mock_mediawiki_instance.summary across
tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c909bd22-9eb5-4227-8bf2-46709e2c4e79
📒 Files selected for processing (2)
backend/conftest.pybackend/test_server.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/test_server.py
Addressed Issues:
This PR addresses the test coverage gap for LLM endpoints — no existing issue tracks this. The 4 LLM endpoints (
/get_shortq_llm,/get_mcq_llm,/get_boolq_llm,/get_problems_llm) and the LLMQuestionGenerator parsing logic previously had zero test coverage.Screenshots:
Additional Notes:
AI Usage Disclosure:
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.
Check one of the checkboxes below:
I have used the following AI models and tools: TODO
Summary by CodeRabbit
Bug Fixes
New Features
Tests
Chores