Skip to content

Test/llm endpoint comprehensive tests#659

Open
zohaib-7035 wants to merge 16 commits into
AOSSIE-Org:mainfrom
zohaib-7035:test/llm-endpoint-comprehensive-tests
Open

Test/llm endpoint comprehensive tests#659
zohaib-7035 wants to merge 16 commits into
AOSSIE-Org:mainfrom
zohaib-7035:test/llm-endpoint-comprehensive-tests

Conversation

@zohaib-7035
Copy link
Copy Markdown

@zohaib-7035 zohaib-7035 commented Apr 4, 2026

Addressed Issues:

This PR addresses the test coverage gap for LLM endpoints — no existing issue tracks this. The 4 LLM endpoints (/get_shortq_llm, /get_mcq_llm, /get_boolq_llm, /get_problems_llm) and the LLMQuestionGenerator parsing logic previously had zero test coverage.

Screenshots:

image

Additional Notes:

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Check one of the checkboxes below:

  • This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools: TODO

Summary by CodeRabbit

  • Bug Fixes

    • Improved input validation with clearer 400 errors, safer JSON parsing, stronger error handling and consistent 500 responses
    • Hardened file uploads (filename sanitization, extension checks, size handling) and YouTube transcript handling (ID validation, timeouts, clearer status codes)
  • New Features

    • Centralized structured JSON error responses and expanded HTTP status coverage (400/413/500/503/504)
    • Optional external-service initialization with graceful degradation
  • Tests

    • Added comprehensive backend test suites and shared test fixtures covering endpoints, parsers, and edge cases
    • CI workflow to run backend tests on multiple Python versions
  • Chores

    • Added test configuration and pinned test dependencies

zohaib-7035 and others added 13 commits February 10, 2026 20:56
…ening to server.py

Problem:
- No input validation on any API endpoint (empty text, huge payloads, invalid types)
- Missing error handling causes 500 errors with stack traces leaked to clients
- /getTranscript endpoint vulnerable to command injection via unsanitized video_id
- /getTranscript has race condition when concurrent requests use glob on shared folder
- File uploads have no size limits or extension validation
- No structured logging (bare print statements)
- Hardcoded credential paths with no env variable fallback

Solution:
- Add _validate_input_text() helper: checks for None, empty, type, and max length (50k chars)
- Add _validate_max_questions() helper: clamps to safe range [1, 20]
- Add _allowed_file() helper: whitelist txt/pdf/docx extensions
- Validate video_id against strict regex pattern (alphanumeric, 11 chars) before subprocess
- Add 30s timeout to yt-dlp subprocess call
- Scope subtitle files to video_id to prevent race conditions between concurrent requests
- Add Flask MAX_CONTENT_LENGTH (10 MB) with 413 error handler
- Wrap all endpoints in try/except with user-friendly error messages (no stack traces)
- Replace print() with Python logging module throughout
- Use os.environ.get() for Google service account file path
- Add input validation to generate_gform (qa_pairs and question_type)
- Use request.get_json(silent=True) to handle malformed JSON gracefully

Co-authored-by: Cursor <cursoragent@cursor.com>
- Fix potential NameError in /getTranscript by wrapping clean_transcript
  in try/except and removing unused expected_vtt variable
- Fix input_questions (list) incorrectly passed as num_questions (int)
  in get_shortq_hard, get_mcq_hard, get_boolq_hard; fix type annotation
  in QuestionGenerator.generate() from bool to int
- Add try/except to generate_gform, remove webbrowser.open_new_tab()
  server-side anti-pattern, return structured JSON response
- Remove unused exception variable 'e' across all except blocks (Ruff F841)
- Tighten glob pattern in getTranscript to use dot separator
- Add strict=True to zip() in get_mcq_answer for defensive validation

Co-authored-by: Cursor <cursoragent@cursor.com>
- Fix critical bug: predict_boolean_answer receives a string instead of
  a list, causing character-level iteration; now wraps question in [question]
- Fix critical bug: answer_style="true_false" is not a valid option in
  QuestionGenerator; changed to "sentences" in get_boolq_hard
- Fix critical bug: make_question_harder in get_boolq_hard receives full
  dict instead of question string; aligned with other hard-question endpoints
- Fix major issue: num_questions silently ignored because use_evaluator
  defaults to False; added use_evaluator=True to all 3 hard-question endpoints
- Fix minor issue: subtitles directory creation moved from __main__ guard
  to module level so it works under Gunicorn/WSGI deployments
- Fix nitpick: num_questions type annotation updated to int | None per PEP 484

Co-authored-by: Cursor <cursoragent@cursor.com>
The length equality check above the loop already guarantees both lists
are the same size, making strict=True redundant. Removing it avoids a
runtime error on Python versions below 3.10.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the minimal test_server.py (which required a live server and
used raw requests) with a proper pytest-based test suite that runs
against Flask's built-in test client — no running server or GPU needed.

172 tests covering:
- All 15 API endpoints with valid input
- Invalid input (empty text, missing fields, wrong types)
- Boundary-value validation for max_questions (clamping 1–20)
- Edge cases (unicode, emoji, special characters, very long text)
- File upload with accepted/rejected file types
- HTTP method enforcement (405 on wrong methods)
- Error handling (generator/pipeline exceptions → 500)

Also adds:
- backend/conftest.py: mocks all heavy ML dependencies (transformers,
  spacy, Generator, Google APIs) at the sys.modules level so the full
  suite runs in <1 second
- .github/workflows/tests.yml: CI workflow that runs tests on every
  push/PR to main across Python 3.9, 3.10, and 3.11

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace `int | None` with `Optional[int]` in Generator/main.py to
ensure compatibility with Python 3.9 and satisfy CodeRabbit review.

Co-authored-by: Cursor <cursoragent@cursor.com>
…e check

Add one-line docstrings to all 16 route handlers and error handler in
server.py, and all ~80 test methods in test_server.py to satisfy the
80% docstring coverage threshold flagged by CodeRabbit pre-merge checks.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Fix path traversal vulnerability in upload_file() by sanitizing
  filename with werkzeug.utils.secure_filename before processing
- Fix credentials loading bug in generate_gform() — add store.get()
  so cached credentials are reused instead of re-running OAuth flow
- Use logger.exception instead of logger.error in subprocess handlers
  for better diagnostics with tracebacks
- Add safety comment explaining VIDEO_ID_PATTERN guards subtitle_path
- Remove unused mock_mcq_gen fixture arg from test_max_questions_float
- Add ClassVar annotations to mutable class attributes (Ruff compliance)
- Add 6 new tests: transcript happy-path, timeout (504), subprocess
  error (500), no subtitles (404), get_content happy-path, and
  service-unavailable (503) — total now 178 tests

Co-authored-by: Cursor <cursoragent@cursor.com>
Merged origin/main into test/backend-comprehensive-tests, resolving
conflicts in backend/server.py and backend/test_server.py:

- server.py: kept PR's import organization, input validation, and
  consistent logger usage; restored LLMQuestionGenerator import from
  upstream
- test_server.py: kept comprehensive pytest-based test suite over
  old-style test functions
- CRITICAL: /get_boolq_hard now uses BoolQGen.generate_boolq() instead
  of qg.generate(answer_style='sentences') which was generating
  sentence-based questions instead of boolean questions
- Add input validation (_validate_input_text, _validate_max_questions)
  to all LLM endpoints (get_shortq_llm, get_mcq_llm, get_boolq_llm,
  get_problems_llm) for consistency with other endpoints
- Replace app.logger with configured logger in LLM endpoints
- Add strict=True to zip() in get_mcq_answer for defense-in-depth
- Fix en-dash to hyphen-minus in _validate_input_text docstring
…onGenerator parsing logic

- Add LLM generator mock infrastructure to conftest.py (defaults, fixture, reset logic)
- Add ~40 endpoint tests for /get_shortq_llm, /get_mcq_llm, /get_boolq_llm, /get_problems_llm
- Create test_llm_generator.py with ~35 unit tests for JSON parsing, fallback parsing, boolean coercion
- Add pytest.ini with filterwarnings to suppress third-party deprecation warnings
- Include LLM endpoints in TestInputValidation and TestHTTPMethods parametrized suites
- Fix pre-existing boolq_gen mock data structure (strings -> dicts)
- Fix pre-existing test_generator_exception_returns_500[/get_boolq_hard] assertion

All 316 tests pass with 0 warnings.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 60a1633b-e411-4b61-888a-aeae3c8f077a

📥 Commits

Reviewing files that changed from the base of the PR and between 84cb0fb and 6e0f86f.

📒 Files selected for processing (1)
  • backend/conftest.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/conftest.py

📝 Walkthrough

Walkthrough

Adds a backend CI workflow, extensive pytest fixtures and tests, corrects a generator type annotation, and hardens the Flask backend with input validation, safer JSON handling, structured error responses, file/YouTube safeguards, and logging.

Changes

Cohort / File(s) Summary
CI / Testing Workflow
.github/workflows/tests.yml
New GitHub Actions workflow "Backend Tests" triggering on push/PR to main for backend/**, running pytest matrix on Python 3.10/3.11/3.12.
Type Annotation Fix
backend/Generator/main.py
Changed QuestionGenerator.generate param num_questions from bool to Optional[int]; added Optional import.
Test Infrastructure & Config
backend/conftest.py, backend/pytest.ini, backend/requirements-test.txt
Added comprehensive conftest.py with autouse mocks, reset logic, and many named fixtures; pytest warning filters in pytest.ini; test dependencies pinned in requirements-test.txt.
Server Robustness & Validation
backend/server.py
Replaced request.get_json() with safe parsing; added _validate_input_text and _validate_max_questions; wrapped handlers with try/except returning JSON 500; optional Google Docs init (503 fallback); sanitized uploads; yt-dlp timeout/error handling and cleanup; 413 handler and module logger.
Unit & Integration Tests
backend/test_llm_generator.py, backend/test_server.py
Added test_llm_generator.py for LLM parsing/coercion (imports real module with Llama stub); replaced ad-hoc script with comprehensive test_server.py pytest suite covering endpoints, validations, error paths, file/transcript flows, and edge cases.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Server as Flask Server
    participant Gen as QuestionGenerator/LLM
    participant Pipeline as QA Pipeline/Predictor
    participant FS as FileProcessor / yt-dlp
    participant Docs as Google Docs Service

    Client->>Server: POST /generate {input_text, max_questions}
    Server->>Server: parse JSON (silent), validate input_text, clamp max_questions
    alt file or youtube flow
        Server->>FS: download/process file or subtitles
        FS-->>Server: extracted text or error
    end
    alt docs flow
        Server->>Docs: fetch document (if initialized)
        Docs-->>Server: content or 503 error
    end
    Server->>Gen: generate questions (may call LLM)
    Gen-->>Server: questions or raise error
    Server->>Pipeline: predict/score answers
    Pipeline-->>Server: answers/confidence
    Server-->>Client: 200 JSON {questions, answers}
    alt any error
        Server-->>Client: 4xx/5xx JSON error (logged)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰
I hopped through tests and tightened seams,
Validated words and tuned the streams.
Mocks in burrows, logs aglow,
Downloads safe, retries slow.
A rabbit cheers—the pipeline grows. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title partially describes the changeset—it mentions "LLM endpoint comprehensive tests" which covers test files added for LLM endpoints, but the PR is significantly broader, introducing a full testing infrastructure with conftest.py fixtures, server.py hardening, pytest.ini configuration, and testing for the entire backend. Consider a more comprehensive title such as 'Add comprehensive backend testing infrastructure and endpoint tests' to better reflect the scope of testing setup, server hardening, and LLM endpoint coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/server.py (1)

503-518: ⚠️ Potential issue | 🟠 Major

Don't duplicate the correct answer in MCQ form choices.

The MCQ payload already includes the correct answer inside options (see backend/conftest.py, Lines 23-37). Prepending qapair["answer"] again can render the correct choice twice and drop one distractor.

Suggested fix
-            choices = [qapair["answer"]] + valid_options[
-                :3
-            ]  # Include up to the first 3 options
+            choices = list(dict.fromkeys([qapair["answer"], *valid_options]))[
+                :4
+            ]
@@
-                choices = [qapair["answer"]] + valid_options[
-                    :3
-                ]  # Include up to the first 3 options
+                choices = list(dict.fromkeys([qapair["answer"], *valid_options]))[
+                    :4
+                ]

Also applies to: 567-577

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 503 - 518, The code in the question_type
"get_mcq" block prepends qapair["answer"] to valid_options which can duplicate
the correct answer (since options already include it); fix by building choices
from valid_options after filtering out empty entries and then ensure the answer
is present exactly once: remove any occurrences of qapair["answer"] from
valid_options, take up to three distractors from that list, assemble choices =
[qapair["answer"]] + distractors (or if you prefer, add the answer only if it
wasn't in options), then random.shuffle(choices); apply the same de-duplication
and assembly logic to the similar code at the other occurrence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/conftest.py`:
- Around line 194-240: The fixture currently only restores side_effect and
return_value but leaves call history and child-mock state, so add a reset of
call history by invoking reset_mock() on each mocked object (e.g. _mock_mcq_gen,
_mock_boolq_gen, _mock_shortq_gen, _mock_answer_predictor,
_mock_question_generator, _mock_file_processor, _mock_qa_pipeline,
_mock_llm_generator) before reassigning side_effect/return_value; this clears
call_count, call_args, and child mocks so assert_called()/call_args checks won't
leak between tests while preserving the existing deepcopy-based return_value
restoration.

In `@backend/server.py`:
- Around line 454-462: Validate each element of qa_pairs before accessing keys:
ensure qa_pairs is a list of dicts and each dict has non-empty string fields
"question" and "answer"; if any item is invalid return a 400 JSON error
describing the bad item. Update the code paths that build requests from qa_pairs
(the block that later accesses qapair["question"] / qapair["answer"]) to perform
this validation up front (check isinstance(item, dict),
isinstance(item.get("question"), str) and item["question"].strip() and same for
"answer") and use a clear jsonify({...}), 400 response instead of allowing
KeyError/TypeError to propagate. Ensure the validation runs in both places where
qa_pairs are consumed (the initial endpoint handling and the later
request-construction path) so callers receive a structured JSON error.
- Around line 791-821: The current implementation uses subtitle_path
(subtitles/{video_id}) which still causes races for concurrent requests of the
same video; change to a per-request unique directory or filename (e.g., include
a short UUID or use tempfile.mkdtemp) so each request writes to and later cleans
up its own path; update the subprocess.run -o target and the subsequent
matching_files glob usage (references: subtitle_path, the subprocess.run call
that sets "-o", and the matching_files = glob.glob(...) line) to use the new
per-request path, and ensure any cleanup/exception handling that references
subtitle_path is updated to remove only that request's files.
- Around line 330-345: The handler currently only checks truthiness and matching
lengths for input_questions and input_options which allows non-list types (e.g.,
strings) to slip through; add a reusable validator (e.g.,
_validate_questions_and_options) that asserts both input_questions and
input_options are instances of list, that their lengths match and are non-empty,
and returns a standardized HTTP 400/json error when invalid; call this validator
from this route and from /get_shortq_answer, /get_boolean_answer and the
hard-question handlers, and keep the existing _validate_input_text call for
input_text validation.
- Around line 136-137: The current POST handler uses
request.get_json(silent=True) or {} which allows non-object JSON (strings,
numbers, arrays) to flow into data.get("input_text", "") and cause 500s; update
the handler that sets data and input_text to first call
request.get_json(silent=True) into a temp (e.g., json_body) and explicitly
reject non-dict payloads with a 400 (or use the existing project helper that
enforces object JSON), then set data = json_body and continue to read input_text
from data.get("input_text", ""); apply the same pattern to other POST routes
using request.get_json(silent=True) or {}.

In `@backend/test_server.py`:
- Around line 615-628: The parametrized test test_with_input_questions is
asserting the wrong collaborator for the /get_boolq_hard case—/get_boolq_hard
routes to BoolQGen.generate_boolq (see BoolQGen.generate_boolq) not
QuestionGenerator.generate (mock_question_generator.generate); update the test
to branch on the endpoint: when endpoint == "/get_boolq_hard" assert that the
BoolQGen mock's generate_boolq was called with num_questions == 2 (and
appropriate args), otherwise assert mock_question_generator.generate was called
with num_questions == 2, using the mocks for BoolQGen and QuestionGenerator
referenced in the test to check the correct call history.
- Around line 1146-1157: The test test_happy_path_returns_content uses a mock
that returns a dict but the real GoogleDocsService.get_document_content (in
Generator.main) returns a string, so update the
mock_docs.get_document_content.return_value to be the same string shape the real
service returns (e.g., "Document text here") and update assertions to verify the
endpoint's response body matches that expected string (use
mock_docs.get_document_content.assert_called_once() and assert resp.json() or
resp.get_data() equals the string as the /get_content contract requires).

---

Outside diff comments:
In `@backend/server.py`:
- Around line 503-518: The code in the question_type "get_mcq" block prepends
qapair["answer"] to valid_options which can duplicate the correct answer (since
options already include it); fix by building choices from valid_options after
filtering out empty entries and then ensure the answer is present exactly once:
remove any occurrences of qapair["answer"] from valid_options, take up to three
distractors from that list, assemble choices = [qapair["answer"]] + distractors
(or if you prefer, add the answer only if it wasn't in options), then
random.shuffle(choices); apply the same de-duplication and assembly logic to the
similar code at the other occurrence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f618d39-e6c5-494a-9795-62d09916dca1

📥 Commits

Reviewing files that changed from the base of the PR and between 2038116 and 6600927.

📒 Files selected for processing (8)
  • .github/workflows/tests.yml
  • backend/Generator/main.py
  • backend/conftest.py
  • backend/pytest.ini
  • backend/requirements-test.txt
  • backend/server.py
  • backend/test_llm_generator.py
  • backend/test_server.py

Comment thread backend/conftest.py
Comment thread backend/server.py
Comment on lines +136 to 137
data = request.get_json(silent=True) or {}
input_text = data.get("input_text", "")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject non-object JSON bodies before the first .get().

request.get_json(silent=True) or {} only normalizes falsy payloads. Truthy JSON scalars/arrays like "hello", 123, or ["x"] still reach data.get(...), so this route—and every POST handler using the same pattern below—can fall into generic 500 handling instead of returning a 400.

Suggested fix
+def _get_json_object():
+    data = request.get_json(silent=True)
+    if not isinstance(data, dict):
+        return None, (jsonify({"error": "Request body must be a JSON object"}), 400)
+    return data, None

Use that helper instead of request.get_json(silent=True) or {} in each POST route.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 136 - 137, The current POST handler uses
request.get_json(silent=True) or {} which allows non-object JSON (strings,
numbers, arrays) to flow into data.get("input_text", "") and cause 500s; update
the handler that sets data and input_text to first call
request.get_json(silent=True) into a temp (e.g., json_body) and explicitly
reject non-dict payloads with a 400 (or use the existing project helper that
enforces object JSON), then set data = json_body and continue to read input_text
from data.get("input_text", ""); apply the same pattern to other POST routes
using request.get_json(silent=True) or {}.

Comment thread backend/server.py
Comment on lines +330 to 345
data = request.get_json(silent=True) or {}
input_text = data.get("input_text", "")
input_questions = data.get("input_question", [])
input_options = data.get("input_options", [])
outputs = []

if not input_questions or not input_options or len(input_questions) != len(input_options):
return jsonify({"output": outputs})
input_text, err = _validate_input_text(input_text)
if err:
return err

for question, options in zip(input_questions, input_options):
# Generate answer using the QA model
qa_response = qa_model(question=question, context=input_text)
generated_answer = qa_response["answer"]
if (
not input_questions
or not input_options
or len(input_questions) != len(input_options)
):
return jsonify({"output": []})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject non-list input_question / input_options payloads.

This route only checks truthiness and length. A JSON string like "abc" is truthy and iterable, so it can be treated as three questions; other wrong shapes can also slip through to bogus 200s or late 500s. The same validator should be reused by /get_shortq_answer, /get_boolean_answer, and the hard-question routes.

Suggested validation
+    if not isinstance(input_questions, list) or not all(
+        isinstance(question, str) for question in input_questions
+    ):
+        return jsonify({"error": "input_question must be a list of strings"}), 400
+
+    if not isinstance(input_options, list) or not all(
+        isinstance(options, list) for options in input_options
+    ):
+        return jsonify({"error": "input_options must be a list of option lists"}), 400
+
     if (
         not input_questions
         or not input_options
         or len(input_questions) != len(input_options)
     ):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data = request.get_json(silent=True) or {}
input_text = data.get("input_text", "")
input_questions = data.get("input_question", [])
input_options = data.get("input_options", [])
outputs = []
if not input_questions or not input_options or len(input_questions) != len(input_options):
return jsonify({"output": outputs})
input_text, err = _validate_input_text(input_text)
if err:
return err
for question, options in zip(input_questions, input_options):
# Generate answer using the QA model
qa_response = qa_model(question=question, context=input_text)
generated_answer = qa_response["answer"]
if (
not input_questions
or not input_options
or len(input_questions) != len(input_options)
):
return jsonify({"output": []})
data = request.get_json(silent=True) or {}
input_text = data.get("input_text", "")
input_questions = data.get("input_question", [])
input_options = data.get("input_options", [])
input_text, err = _validate_input_text(input_text)
if err:
return err
if not isinstance(input_questions, list) or not all(
isinstance(question, str) for question in input_questions
):
return jsonify({"error": "input_question must be a list of strings"}), 400
if not isinstance(input_options, list) or not all(
isinstance(options, list) for options in input_options
):
return jsonify({"error": "input_options must be a list of option lists"}), 400
if (
not input_questions
or not input_options
or len(input_questions) != len(input_options)
):
return jsonify({"output": []})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 330 - 345, The handler currently only checks
truthiness and matching lengths for input_questions and input_options which
allows non-list types (e.g., strings) to slip through; add a reusable validator
(e.g., _validate_questions_and_options) that asserts both input_questions and
input_options are instances of list, that their lengths match and are non-empty,
and returns a standardized HTTP 400/json error when invalid; call this validator
from this route and from /get_shortq_answer, /get_boolean_answer and the
hard-question handlers, and keep the existing _validate_input_text call for
input_text validation.

Comment thread backend/server.py
Comment on lines +454 to +462
data = request.get_json(silent=True) or {}
qa_pairs = data.get("qa_pairs", [])
question_type = data.get("question_type", "")

if not qa_pairs or not isinstance(qa_pairs, list):
return jsonify({"error": "qa_pairs must be a non-empty list"}), 400

if not question_type or not isinstance(question_type, str):
return jsonify({"error": "question_type is required"}), 400
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate each qa_pairs item before indexing into it.

The endpoint only verifies that qa_pairs is a non-empty list. Items like {} or "bad" still reach qapair["question"] / qapair["answer"], and that request-construction path runs before Line 615, so callers get an unstructured 500 instead of a stable JSON error.

Also applies to: 486-612

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 454 - 462, Validate each element of qa_pairs
before accessing keys: ensure qa_pairs is a list of dicts and each dict has
non-empty string fields "question" and "answer"; if any item is invalid return a
400 JSON error describing the bad item. Update the code paths that build
requests from qa_pairs (the block that later accesses qapair["question"] /
qapair["answer"]) to perform this validation up front (check isinstance(item,
dict), isinstance(item.get("question"), str) and item["question"].strip() and
same for "answer") and use a clear jsonify({...}), 400 response instead of
allowing KeyError/TypeError to propagate. Ensure the validation runs in both
places where qa_pairs are consumed (the initial endpoint handling and the later
request-construction path) so callers receive a structured JSON error.

Comment thread backend/server.py
Comment on lines +791 to +821
# Use video_id-scoped file path to avoid race conditions between
# concurrent requests for different videos.
# Safety: subtitle_path is safe because VIDEO_ID_PATTERN (above) restricts
# video_id to [a-zA-Z0-9_-]{11}, preventing path traversal and injection.
subtitle_path = os.path.join("subtitles", f"{video_id}")

latest_subtitle = max(subtitle_files, key=os.path.getctime)
transcript_text = clean_transcript(latest_subtitle)
try:
subprocess.run(
[
"yt-dlp",
"--write-auto-sub",
"--sub-lang", "en",
"--skip-download",
"--sub-format", "vtt",
"-o", f"{subtitle_path}.%(ext)s",
f"https://www.youtube.com/watch?v={video_id}",
],
check=True,
capture_output=True,
text=True,
timeout=30,
)
except subprocess.TimeoutExpired:
logger.exception("yt-dlp timed out for video %s", video_id)
return jsonify({"error": "Transcript download timed out"}), 504
except subprocess.CalledProcessError as e:
logger.exception("yt-dlp failed for video %s: %s", video_id, e.stderr)
return jsonify({"error": "Failed to download subtitles"}), 500

# Look for the VTT file specific to this video_id
matching_files = glob.glob(f"subtitles/{video_id}.*.vtt")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a per-request subtitle path, not just a per-video path.

subtitles/<video_id> avoids collisions between different videos, but concurrent requests for the same video still share the same download and cleanup target. One request can delete the VTT before the other reads it, causing intermittent 404/500s on popular videos.

Also applies to: 831-837

🧰 Tools
🪛 Ruff (0.15.9)

[error] 798-798: subprocess call: check for execution of untrusted input

(S603)


[error] 799-807: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 791 - 821, The current implementation uses
subtitle_path (subtitles/{video_id}) which still causes races for concurrent
requests of the same video; change to a per-request unique directory or filename
(e.g., include a short UUID or use tempfile.mkdtemp) so each request writes to
and later cleans up its own path; update the subprocess.run -o target and the
subsequent matching_files glob usage (references: subtitle_path, the
subprocess.run call that sets "-o", and the matching_files = glob.glob(...)
line) to use the new per-request path, and ensure any cleanup/exception handling
that references subtitle_path is updated to remove only that request's files.

Comment thread backend/test_server.py Outdated
Comment thread backend/test_server.py
- Add reset_mock() calls in conftest.py to clear call history between tests
- Fix test_with_input_questions to assert correct mock for /get_boolq_hard

All 316 tests pass with 0 warnings.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
backend/test_server.py (1)

1153-1164: ⚠️ Potential issue | 🟠 Major

Make the happy-path /get_content mock match the real service contract.

get_document_content() returns raw document text, not {"content": ...}. With the current dict-shaped mock and status-only assertion, this test still won't catch a response-shape regression on the endpoint.

Suggested test fix
     with patch("server.docs_service") as mock_docs:
-        mock_docs.get_document_content.return_value = {
-            "content": "Document text here"
-        }
+        mock_docs.get_document_content.return_value = "Document text here"
         resp = client.post(
             "/get_content",
             json={"document_url": "https://docs.google.com/document/d/abc123"},
         )
         assert resp.status_code == 200
         mock_docs.get_document_content.assert_called_once()
+        assert resp.get_json() == "Document text here"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_server.py` around lines 1153 - 1164, The test's mock incorrectly
returns a dict but the real docs service get_document_content(document_url)
returns a raw string; update test_happy_path_returns_content to have
mock_docs.get_document_content.return_value = "Document text here" (raw string),
then call client.post(...) as before and assert both resp.status_code == 200 and
that the response body/JSON contains the expected shape/content (e.g.,
resp.json() or resp.get_data() matches the endpoint's contract, and optionally
assert mock_docs.get_document_content.assert_called_once_with(...) to verify it
was called with the document_url).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/conftest.py`:
- Around line 23-42: The fixtures in _DEFAULTS (mcq_gen and boolq_gen) do not
match the real generator contracts returned by Generator.main.py; update the
mcq_gen entry to include top-level keys "statement", "questions", and
"time_taken" (with "questions" containing the existing question objects) and
update the boolq_gen entry to include top-level keys "Text", "Count", and
"Boolean_Questions" (with "Boolean_Questions" containing the existing items and
"Count" reflecting the list length), so tests exercise the real response shape
used by functions that consume these fixtures.
- Around line 202-260: The teardown misses clearing side_effect and restoring
return_value for the two tracked mocks _mock_google_docs.get_document_content
and _mock_mediawiki_instance.summary, which lets mutated state leak between
tests; in the teardown add setting
_mock_google_docs.get_document_content.side_effect = None and
_mock_mediawiki_instance.summary.side_effect = None in the "Clear side effects"
block, and restore their return_value in the "Restore defaults" block using
copy.deepcopy(_DEFAULTS[...] ) if those keys exist or set return_value = None
otherwise, keeping the existing pattern used for _mock_llm_generator and others
and preserving the final reset_mock loop over _tracked_mocks.

In `@backend/test_server.py`:
- Around line 125-155: The tests for max_questions should assert the exact
normalized value produced by backend/server.py::_validate_max_questions instead
of loose range checks; update test_max_questions_clamped_high,
test_max_questions_clamped_low, test_max_questions_zero,
test_max_questions_non_integer_string (and analogous tests for /get_problems and
the LLM variants) to inspect
mock_mcq_gen.generate_mcq.call_args[0][0]["max_questions] and assert it equals
the precise clamped/default value (e.g., 20 for inputs >20, 1 for <=0, and the
configured default for non-integer strings) so the handler must pass the exact
normalized value to the collaborator.

---

Duplicate comments:
In `@backend/test_server.py`:
- Around line 1153-1164: The test's mock incorrectly returns a dict but the real
docs service get_document_content(document_url) returns a raw string; update
test_happy_path_returns_content to have
mock_docs.get_document_content.return_value = "Document text here" (raw string),
then call client.post(...) as before and assert both resp.status_code == 200 and
that the response body/JSON contains the expected shape/content (e.g.,
resp.json() or resp.get_data() matches the endpoint's contract, and optionally
assert mock_docs.get_document_content.assert_called_once_with(...) to verify it
was called with the document_url).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 155d0d76-f1f6-4236-a7c7-d09f66624617

📥 Commits

Reviewing files that changed from the base of the PR and between 6600927 and b28be4a.

📒 Files selected for processing (2)
  • backend/conftest.py
  • backend/test_server.py

Comment thread backend/conftest.py
Comment thread backend/conftest.py
Comment thread backend/test_server.py
- Set exact assertions for max_questions tests instead of loose thresholds
- Fix /get_content mock to correctly return string instead of dict
- Add missing tear_down keys for mocked third party instances
- Align mock _DEFAULTS mcq_gen and boolq_gen structure to exact application structure
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
backend/conftest.py (1)

252-253: ⚠️ Potential issue | 🟡 Minor

Mediawiki mock return_value not properly restored in teardown.

Line 177-179 sets _mock_mediawiki_instance.summary.return_value to a specific string, but teardown resets it to None. After the first test completes, subsequent tests will have mediawiki.summary() return None instead of the expected default, breaking test isolation.

Consider adding a mediawiki entry to _DEFAULTS and restoring it properly:

🔧 Proposed fix

Add to _DEFAULTS (around line 57):

"mediawiki_summary": "Artificial intelligence is intelligence demonstrated by machines.",

Update lines 177-179:

 _mock_mediawiki_instance.summary.return_value = (
-    "Artificial intelligence is intelligence demonstrated by machines."
+    _DEFAULTS["mediawiki_summary"]
 )

Update line 253:

-    _mock_mediawiki_instance.summary.return_value = None
+    _mock_mediawiki_instance.summary.return_value = _DEFAULTS["mediawiki_summary"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/conftest.py` around lines 252 - 253, The teardown is overwriting the
MediaWiki mock's configured return value with None; add a "mediawiki_summary"
key to _DEFAULTS (e.g., a default summary string) and use that key when setting
and restoring _mock_mediawiki_instance.summary.return_value so tests get
restored to the expected default. Update the setup that assigns
_mock_mediawiki_instance.summary.return_value to use
_DEFAULTS["mediawiki_summary"] and modify the teardown to restore
_mock_mediawiki_instance.summary.return_value from _DEFAULTS instead of setting
it to None, ensuring test isolation for _mock_mediawiki_instance.summary across
tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@backend/conftest.py`:
- Around line 252-253: The teardown is overwriting the MediaWiki mock's
configured return value with None; add a "mediawiki_summary" key to _DEFAULTS
(e.g., a default summary string) and use that key when setting and restoring
_mock_mediawiki_instance.summary.return_value so tests get restored to the
expected default. Update the setup that assigns
_mock_mediawiki_instance.summary.return_value to use
_DEFAULTS["mediawiki_summary"] and modify the teardown to restore
_mock_mediawiki_instance.summary.return_value from _DEFAULTS instead of setting
it to None, ensuring test isolation for _mock_mediawiki_instance.summary across
tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c909bd22-9eb5-4227-8bf2-46709e2c4e79

📥 Commits

Reviewing files that changed from the base of the PR and between b28be4a and 84cb0fb.

📒 Files selected for processing (2)
  • backend/conftest.py
  • backend/test_server.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/test_server.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants