Skip to content

Refactor/system reliability#652

Open
Arul-Aravind wants to merge 2 commits into
AOSSIE-Org:mainfrom
Arul-Aravind:refactor/system-reliability
Open

Refactor/system reliability#652
Arul-Aravind wants to merge 2 commits into
AOSSIE-Org:mainfrom
Arul-Aravind:refactor/system-reliability

Conversation

@Arul-Aravind
Copy link
Copy Markdown

@Arul-Aravind Arul-Aravind commented Mar 31, 2026

Summary

This PR improves EduAid's reliability and maintainability across the backend, web app, browser extension, and desktop app.

Main improvements in this PR:

  • Refactored the Flask backend to use a cleaner app-factory pattern with lazy-loaded heavy services, making startup more reliable and the codebase easier to test and maintain.
  • Added stronger request validation, clearer error handling, a /health endpoint, safer file upload handling, and safer transcript-fetching behavior.
  • Fixed a backend correctness bug in boolean answer generation.
  • Fixed hard-question generation endpoints so they correctly respect max_questions.
  • Improved Google Form generation responses and handling for MCQ, boolean, and short-answer flows.
  • Replaced the old live-server style backend test script with actual Flask route tests.
  • Fixed web and extension integration issues around hard-mode routing, output parsing, loading states, and supported file types.
  • Fixed extension host permissions and removed recursive storage-write behavior in the background service worker.
  • Fixed the Electron desktop build flow so the built web app is properly copied into the desktop bundle, and corrected the desktop icon path.
  • Cleaned current web build warnings by fixing naming issues, hook dependency issues, dead state, API client export style, and PDF worker loop structure.
  • Removed unused direct dependencies and added the missing Babel workaround required by the current CRA toolchain.

Testing

I tested these changes locally with:

  • python3 -m py_compile backend/server.py backend/test_server.py
  • python3 backend/test_server.py
  • npm run build in eduaid_web
  • npm run build in extension
  • npm run build in eduaid_desktop

Addressed Issues:

N/A - Proactive architectural and reliability audit in preparation for GSoC 2026.

Screenshots/Recordings:

N/A for most of this PR because the main changes are reliability, API, build, and integration improvements.

Additional Notes:

  • This PR is scoped as one major reliability and maintainability improvement across EduAid.
  • Builds now pass successfully.
  • There is still an upstream Browserslist caniuse-lite warning from the tooling stack in some environments, but it is not introduced by this PR and does not block successful builds.

AI Usage Disclosure:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools: OpenAI, Gemini for refactoring assistance, test scaffolding, and review support. I reviewed, edited, and tested all changes locally before submission.

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

  • New Features

    • Added health check endpoint for monitoring server status.
    • Extended file upload support to include DOCX and TXT formats.
  • Bug Fixes

    • Improved MCQ question handling and answer normalization.
    • Enhanced error validation and responses across API endpoints.
    • Improved Google Forms integration with better link handling.
  • Chores

    • Refactored backend server initialization and dependency management.
    • Updated extension permissions to align with new server configuration.
    • Improved build workflow for desktop application.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

The PR refactors the Flask backend into a factory pattern with lazy-loaded services and centralized request validation. Frontend components normalize MCQ data structures and update form generation. Build configurations and dependencies are updated across packages, with stricter type handling throughout.

Changes

Cohort / File(s) Summary
Backend App Refactoring
backend/server.py, backend/test_server.py
Migrated Flask app from eager instantiation to create_app() factory with ServiceRegistry for lazy-loading dependencies. Added request validation helpers and custom exceptions (InvalidRequest, ServiceUnavailable). Rewrote tests to use Flask test client with stubbed services instead of live HTTP requests.
Frontend Page Components
eduaid_web/src/pages/Output.jsx, eduaid_web/src/pages/Text_Input.jsx, extension/src/pages/question/Question.jsx, extension/src/pages/text_input/TextInput.jsx
Added normalizeMcqPair() helper to unify MCQ data shapes across output sources. Updated form generation to prefer edit_link over form_link. Refined state handling: removed unused fileContent, clamped question count to [1,...], extended hard-mode endpoint detection for boolean questions, updated supported file formats in UI.
Frontend Utilities & Workers
eduaid_web/src/utils/apiClient.js, eduaid_web/src/workers/pdfWorker.js, extension/src/pages/question/SidePanel.jsx
Refactored shuffleArray() to avoid mutation by cloning input before shuffling. Replaced .forEach loops with indexed for/for...of loops in PDF worker. Updated form link selection in side panel to use edit_link when available.
Frontend Navigation
eduaid_web/src/pages/PageNotFound.jsx, eduaid_web/src/App.js
Updated component imports from snake\_case to camelCase (Question_TypeQuestionType, Text_InputTextInput). Replaced useNavigate() return name from router to navigate and updated dependency array.
Desktop & Extension Configuration
eduaid_desktop/config.js, extension/public/manifest.json
Updated desktop app icon path from assets/icons/win/aossie.ico to assets/aossie.ico. Changed extension host permissions from http://127.0.0.1:8000/* to http://localhost:5000/* and http://127.0.0.1:5000/*.
Build Scripts & Dependencies
eduaid_desktop/package.json, eduaid_web/package.json, extension/package.json
Replaced Windows xcopy-based copy step with Node-based fs.rmSync/fs.cpSync in desktop build. Consolidated build:web and copy:web into main build script. Updated dependencies: removed dotenv, react-toggle-switch; added @babel/plugin-proposal-private-property-in-object, browserslist, caniuse-lite.
Extension Background Script
extension/public/background.js
Added clearBadge() helper to centralize badge-clearing logic. Moved context menu initialization into chrome.runtime.onInstalled listener. Enhanced askExtension handler with tab ID validation, local storage persistence, and content script injection. Improved generateQuizFromYouTube with script detection and conditional injection. Added error handling for storage operations and explicit return value for TEXT_SELECTED message.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 A factory app now grows so free,
With lazy services in registry!
MCQ pairs dance in unified delight,
Chrome extensions reach new port height.
Builds run swift on Node's command,
\bounces\ — refactored codebase, perfectly planned! 🎉

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Refactor/system reliability' is vague and generic, using non-descriptive phrasing that does not convey the main changes (backend factory pattern, validation improvements, service registry, etc.). Consider using a more specific title like 'Refactor Flask app to factory pattern with service registry' or 'Improve system reliability with app factory pattern and validation' to better summarize the primary change.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
eduaid_web/src/pages/Output.jsx (1)

151-157: ⚠️ Potential issue | 🟡 Minor

Add defensive check for output before iteration.

When questionType === "get_boolq", the code accesses qaPairsFromStorage["output"].forEach(...) without checking if output exists. If output is undefined, this will throw a runtime error.

🐛 Proposed fix
-      if (questionType === "get_boolq") {
+      if (questionType === "get_boolq" && qaPairsFromStorage["output"]) {
         qaPairsFromStorage["output"].forEach((qaPair) => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eduaid_web/src/pages/Output.jsx` around lines 151 - 157, Check that
qaPairsFromStorage["output"] exists and is an array before calling forEach when
questionType === "get_boolq"; replace the direct access with a defensive guard
(e.g., verify Array.isArray(qaPairsFromStorage.output) or use optional chaining)
and only push into combinedQaPairs when output is present to avoid runtime
errors in Output.jsx where qaPairsFromStorage and the "output" key are used.
🧹 Nitpick comments (8)
eduaid_web/src/pages/Text_Input.jsx (1)

191-193: Enforce the advertised upload formats in the file picker.

The UI says PDF, DOCX, TXT are supported, but the file input still allows any file type. Add an accept filter to reduce invalid uploads at source.

Suggested patch
-          <input type="file" ref={fileInputRef} onChange={handleFileUpload} style={{ display: "none" }} />
+          <input
+            type="file"
+            accept=".pdf,.docx,.txt,application/pdf,application/vnd.openxmlformats-officedocument.wordprocessingml.document,text/plain"
+            ref={fileInputRef}
+            onChange={handleFileUpload}
+            style={{ display: "none" }}
+          />
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eduaid_web/src/pages/Text_Input.jsx` around lines 191 - 193, Update the
hidden file input in Text_Input.jsx to enforce the advertised formats by adding
an accept attribute (e.g.
accept=".pdf,.docx,.txt,application/pdf,application/vnd.openxmlformats-officedocument.wordprocessingml.document,text/plain")
on the <input> that uses fileInputRef and handleFileUpload so the OS file picker
filters to PDF, DOCX and TXT; keep the existing handleFileUpload validation in
place for server-side safety.
eduaid_web/src/pages/Output.jsx (1)

90-108: Consider extracting normalizeMcqPair to a shared utility.

This function is duplicated verbatim in extension/src/pages/question/Question.jsx. Extracting it to a shared utility would reduce maintenance burden and ensure consistent behavior.

A shared location like eduaid_web/src/utils/questionUtils.js (and similarly for the extension) would allow both components to import the same normalization logic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eduaid_web/src/pages/Output.jsx` around lines 90 - 108, The normalizeMcqPair
function is duplicated; extract it into a shared utility (e.g., create a new
function normalizeMcqPair in eduaid_web/src/utils/questionUtils.js) and replace
the local implementations in Output.jsx and
extension/src/pages/question/Question.jsx to import and call that exported
normalizeMcqPair; ensure the exported function preserves the same signature and
behavior (handles qaPair.answer as array or scalar, derives
question/question_statement, sets question_type, options, answer, and context)
and update imports in both files accordingly.
extension/src/pages/question/SidePanel.jsx (1)

116-122: Consider adding error handling for missing form URLs.

The fallback from edit_link to form_link is correct and consistent with other components. However, if both are missing/undefined, window.open(undefined, "_blank") will open a blank tab.

💡 Optional: Add defensive check
     if (response.ok) {
       const result = await response.json();
       const formUrl = result.edit_link || result.form_link;
+      if (!formUrl) {
+        console.error("No form URL returned from server");
+        return;
+      }
       window.open(formUrl, "_blank");
     } else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/src/pages/question/SidePanel.jsx` around lines 116 - 122, The
response handling currently falls back to result.edit_link || result.form_link
then calls window.open(formUrl, "_blank") without ensuring formUrl exists;
change the logic in the SidePanel.jsx response.ok block to verify that formUrl
(from result.edit_link or result.form_link) is truthy before calling
window.open(formUrl, "_blank"), and when it's missing call console.error (or
show a user-facing message) and skip opening a blank tab; update the code around
the response.ok branch and the window.open invocation to perform this defensive
check.
extension/public/background.js (1)

27-42: Redundant storage write: text is stored twice.

Line 29 stores selectedText in chrome.storage.local, and the content script at contentScript.js:9 also stores the same text to the same key when it receives the message (per context snippet 2). This redundancy is harmless but unnecessary.

Consider removing one of the storage writes to simplify the flow. Since the background script stores first (line 29), the content script's generateQuestions function could skip its storage call, or vice versa.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/public/background.js` around lines 27 - 42, Remove the redundant
write by eliminating the duplicate chrome.storage.local.set call inside the
content script's generateQuestions function (so the background context menu
handler remains the single writer when the menu is used); update
generateQuestions in contentScript.js to rely on the selectedText from the
incoming message (chrome.runtime.onMessage handler) instead of storing it again
or, if you prefer to keep content script storage, remove the
chrome.storage.local.set call in the background context menu handler that
currently runs when info.menuItemId === "askExtension". Ensure whichever write
you remove still leaves selectedText available to downstream logic (message
payload or storage) and keep the error handling and badge-setting logic in the
background handler unchanged.
backend/test_server.py (1)

54-56: Stub behavior may not scale for larger inputs.

StubAnswerPredictor.predict_boolean_answer returns [True, False][:len(payload["input_question"])], which caps at 2 answers regardless of input size. While the current test uses exactly 2 questions, this could cause test failures if tests are extended to larger inputs.

💡 Generate answers matching input length
 class StubAnswerPredictor:
     def predict_boolean_answer(self, payload):
-        return [True, False][: len(payload["input_question"])]
+        count = len(payload["input_question"])
+        return [i % 2 == 0 for i in range(count)]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_server.py` around lines 54 - 56, The stub in
StubAnswerPredictor.predict_boolean_answer currently slices a two-element list
so it never returns more than two answers; change it to generate a boolean list
whose length matches len(payload["input_question"]) (e.g., repeat or compute a
pattern and then slice to n) so the returned list scales with the number of
questions and preserves the alternating True/False behavior for arbitrary n.
backend/server.py (3)

587-590: Consider wrapping Google Forms API calls in try-except for clearer error messages.

If forms().create() or batchUpdate() fails (quota exceeded, network error, invalid credentials), the exception will propagate as a 500 error. Catching and re-raising as ServiceUnavailable with a user-friendly message would improve debuggability.

Proposed improvement
+        try:
             result = form_service.forms().create(body={"info": {"title": "EduAid form"}}).execute()
             form_service.forms().batchUpdate(
                 formId=result["formId"], body={"requests": requests_list}
             ).execute()
+        except Exception as exc:
+            raise ServiceUnavailable(f"Failed to create Google Form: {exc}") from exc
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 587 - 590, Wrap the Google Forms calls in a
try-except: surround form_service.forms().create(...) and
form_service.forms().batchUpdate(formId=result["formId"], body={"requests":
requests_list}).execute() with a try block, catch specific API exceptions (e.g.,
googleapiclient.errors.HttpError) and a general Exception fallback, log the
original exception details and then raise ServiceUnavailable with a clear,
user-friendly message that includes context (e.g., "Failed to create or update
Google Form") while retaining the original error in the logs for debugging;
ensure you reference the result variable and requests_list when composing the
message for clarity.

692-716: Subprocess call is adequately protected by input validation.

Static analysis flags this as potentially unsafe, but the video_id is validated against [A-Za-z0-9_-]{6,20} on line 683, which prevents shell injection. The partial executable path (yt-dlp) is acceptable for system tools typically installed in standard locations.

One improvement: consider adding a timeout to prevent hanging on slow network conditions.

Optional: Add timeout
             subprocess.run(
                 [
                     "yt-dlp",
                     "--write-auto-sub",
                     "--sub-lang",
                     "en",
                     "--skip-download",
                     "--sub-format",
                     "vtt",
                     "-o",
                     output_template,
                     youtube_url,
                 ],
                 check=True,
                 capture_output=True,
                 text=True,
+                timeout=60,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 692 - 716, The subprocess.run invocation that
calls yt-dlp (in the block using subprocess.run with arguments including
"yt-dlp", output_template and youtube_url) should include a timeout to avoid
hanging on slow networks; add a sensible timeout param (e.g. timeout=30) to that
subprocess.run call and add an except subprocess.TimeoutExpired handler that
logs the timeout (use current_app.logger.exception or logger) referencing
video_id and returns a 504/ServiceUnavailable JSON response indicating the fetch
timed out, while preserving the existing FileNotFoundError and
CalledProcessError handling.

194-199: Consider documenting the asymmetric None handling.

_require_list returns an empty list for None but raises InvalidRequest for non-list types. This behavior is reasonable (treating missing fields as empty lists) but could surprise callers expecting strict validation for all non-list inputs. A brief docstring would clarify intent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 194 - 199, The helper function _require_list
currently treats None as an empty list but raises InvalidRequest for any
non-list value; add a brief docstring to _require_list that explains this
asymmetric None handling (i.e., missing/None is normalized to [], while non-list
inputs are invalid), mention the rationale (treat missing fields as empty lists)
and that callers should expect InvalidRequest for wrong types, and ensure the
docstring references the function name _require_list and the exception
InvalidRequest so future readers understand the intended behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Line 722: Replace os.path.getctime with os.path.getmtime when computing
latest_subtitle from subtitle_files to use file modification time
(cross-platform consistent); update the expression latest_subtitle =
max(subtitle_files, key=os.path.getmtime) in the code that selects the most
recent subtitle file (search around the latest_subtitle and subtitle_files
usage) and ensure os is already imported.
- Around line 508-523: The loop using zip(input_questions, input_options) should
use zip(..., strict=True) to enforce equal-length iterables; update the zip call
inside the loop (the one iterating over input_questions and input_options in the
block that calls qa_model and computes cosine_similarity) to
zip(input_questions, input_options, strict=True) so Python raises immediately if
lengths differ (ensure runtime uses Python 3.10+ and adjust any tests or callers
expecting silent truncation).
- Around line 120-146: The _build_forms_service function currently uses
oauth2client.tools.run_flow (tools.run_flow) which triggers an interactive
browser OAuth flow unsuitable for headless servers; replace this with a service
account flow using
google.oauth2.service_account.Credentials.from_service_account_file and the
google-auth/google-auth-httplib2 stack: remove the oauth2client/file/tools logic
and token_file usage, load the JSON key with
Credentials.from_service_account_file (or use domain-wide delegation if needed),
create an AuthorizedHttp (or pass the credentials directly to discovery.build
via the credentials parameter), and ensure the Google Form is shared with the
service account email (or configure domain delegation) so
discovery.build("forms", "v1", ...) uses these service-account credentials
instead of tools.run_flow.

In `@eduaid_desktop/package.json`:
- Line 11: Add a Node.js engines requirement of at least 16.7.0 to package.json
and update README references to "16.7.0 or higher": specifically, add an
"engines" field in package.json (next to existing metadata) specifying "node":
">=16.7.0" so the build script that uses fs.cpSync in the "copy:web" npm script
is guaranteed to run, and update any README.md text that currently says "version
16 or higher" to "16.7.0 or higher" to keep documentation consistent with the
runtime requirement.

In `@extension/public/background.js`:
- Around line 68-84: The current chrome.runtime.onMessage.addListener handler
uses chrome.storage.local.set with a callback and sendResponse inside that
callback, which can be unreliable; refactor the listener
(chrome.runtime.onMessage.addListener) to an async flow: detect request.type ===
"TEXT_SELECTED", await a Promise-based storage set (wrap
chrome.storage.local.set in a Promise or use the Promise-based storage API),
then call sendResponse({status: "success"}) or sendResponse({status: "error",
error}) on failure, and still return true from the listener to keep the message
channel open; reference the listener, sendResponse, and chrome.storage.local.set
when making the change.

In `@extension/src/pages/question/Question.jsx`:
- Around line 91-95: The code may double-process qaPairsFromStorage["output"]
and uses loose equality; update the conditional logic around questionType and
output handling to avoid duplicates: change any loose check like questionType ==
"get_boolq" to strict equality (questionType === "get_boolq"), and add a
defensive guard (e.g., check qaPairsFromStorage["output"] != null) before
iterating in the block that handles boolean or non-MCQ flows so
normalizeMcqPair/other processors only run when output exists and only in the
intended branch (ensure the branch that handles get_mcq uses the same
output-exists guard and the other branch skips output when questionType ===
"get_mcq").

---

Outside diff comments:
In `@eduaid_web/src/pages/Output.jsx`:
- Around line 151-157: Check that qaPairsFromStorage["output"] exists and is an
array before calling forEach when questionType === "get_boolq"; replace the
direct access with a defensive guard (e.g., verify
Array.isArray(qaPairsFromStorage.output) or use optional chaining) and only push
into combinedQaPairs when output is present to avoid runtime errors in
Output.jsx where qaPairsFromStorage and the "output" key are used.

---

Nitpick comments:
In `@backend/server.py`:
- Around line 587-590: Wrap the Google Forms calls in a try-except: surround
form_service.forms().create(...) and
form_service.forms().batchUpdate(formId=result["formId"], body={"requests":
requests_list}).execute() with a try block, catch specific API exceptions (e.g.,
googleapiclient.errors.HttpError) and a general Exception fallback, log the
original exception details and then raise ServiceUnavailable with a clear,
user-friendly message that includes context (e.g., "Failed to create or update
Google Form") while retaining the original error in the logs for debugging;
ensure you reference the result variable and requests_list when composing the
message for clarity.
- Around line 692-716: The subprocess.run invocation that calls yt-dlp (in the
block using subprocess.run with arguments including "yt-dlp", output_template
and youtube_url) should include a timeout to avoid hanging on slow networks; add
a sensible timeout param (e.g. timeout=30) to that subprocess.run call and add
an except subprocess.TimeoutExpired handler that logs the timeout (use
current_app.logger.exception or logger) referencing video_id and returns a
504/ServiceUnavailable JSON response indicating the fetch timed out, while
preserving the existing FileNotFoundError and CalledProcessError handling.
- Around line 194-199: The helper function _require_list currently treats None
as an empty list but raises InvalidRequest for any non-list value; add a brief
docstring to _require_list that explains this asymmetric None handling (i.e.,
missing/None is normalized to [], while non-list inputs are invalid), mention
the rationale (treat missing fields as empty lists) and that callers should
expect InvalidRequest for wrong types, and ensure the docstring references the
function name _require_list and the exception InvalidRequest so future readers
understand the intended behavior.

In `@backend/test_server.py`:
- Around line 54-56: The stub in StubAnswerPredictor.predict_boolean_answer
currently slices a two-element list so it never returns more than two answers;
change it to generate a boolean list whose length matches
len(payload["input_question"]) (e.g., repeat or compute a pattern and then slice
to n) so the returned list scales with the number of questions and preserves the
alternating True/False behavior for arbitrary n.

In `@eduaid_web/src/pages/Output.jsx`:
- Around line 90-108: The normalizeMcqPair function is duplicated; extract it
into a shared utility (e.g., create a new function normalizeMcqPair in
eduaid_web/src/utils/questionUtils.js) and replace the local implementations in
Output.jsx and extension/src/pages/question/Question.jsx to import and call that
exported normalizeMcqPair; ensure the exported function preserves the same
signature and behavior (handles qaPair.answer as array or scalar, derives
question/question_statement, sets question_type, options, answer, and context)
and update imports in both files accordingly.

In `@eduaid_web/src/pages/Text_Input.jsx`:
- Around line 191-193: Update the hidden file input in Text_Input.jsx to enforce
the advertised formats by adding an accept attribute (e.g.
accept=".pdf,.docx,.txt,application/pdf,application/vnd.openxmlformats-officedocument.wordprocessingml.document,text/plain")
on the <input> that uses fileInputRef and handleFileUpload so the OS file picker
filters to PDF, DOCX and TXT; keep the existing handleFileUpload validation in
place for server-side safety.

In `@extension/public/background.js`:
- Around line 27-42: Remove the redundant write by eliminating the duplicate
chrome.storage.local.set call inside the content script's generateQuestions
function (so the background context menu handler remains the single writer when
the menu is used); update generateQuestions in contentScript.js to rely on the
selectedText from the incoming message (chrome.runtime.onMessage handler)
instead of storing it again or, if you prefer to keep content script storage,
remove the chrome.storage.local.set call in the background context menu handler
that currently runs when info.menuItemId === "askExtension". Ensure whichever
write you remove still leaves selectedText available to downstream logic
(message payload or storage) and keep the error handling and badge-setting logic
in the background handler unchanged.

In `@extension/src/pages/question/SidePanel.jsx`:
- Around line 116-122: The response handling currently falls back to
result.edit_link || result.form_link then calls window.open(formUrl, "_blank")
without ensuring formUrl exists; change the logic in the SidePanel.jsx
response.ok block to verify that formUrl (from result.edit_link or
result.form_link) is truthy before calling window.open(formUrl, "_blank"), and
when it's missing call console.error (or show a user-facing message) and skip
opening a blank tab; update the code around the response.ok branch and the
window.open invocation to perform this defensive check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b592fa9b-96a9-404c-a4ec-57f0e4d67a3e

📥 Commits

Reviewing files that changed from the base of the PR and between 2038116 and 1cc520e.

⛔ Files ignored due to path filters (2)
  • eduaid_web/package-lock.json is excluded by !**/package-lock.json
  • extension/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (17)
  • backend/server.py
  • backend/test_server.py
  • eduaid_desktop/config.js
  • eduaid_desktop/package.json
  • eduaid_web/package.json
  • eduaid_web/src/App.js
  • eduaid_web/src/pages/Output.jsx
  • eduaid_web/src/pages/PageNotFound.jsx
  • eduaid_web/src/pages/Text_Input.jsx
  • eduaid_web/src/utils/apiClient.js
  • eduaid_web/src/workers/pdfWorker.js
  • extension/package.json
  • extension/public/background.js
  • extension/public/manifest.json
  • extension/src/pages/question/Question.jsx
  • extension/src/pages/question/SidePanel.jsx
  • extension/src/pages/text_input/TextInput.jsx

Comment thread backend/server.py
Comment on lines +120 to +146
def _build_forms_service(self):
client_secrets_file = current_app.config["CLIENT_SECRETS_FILE"]
token_file = current_app.config["FORMS_TOKEN_FILE"]

if not os.path.exists(client_secrets_file):
return None

from apiclient import discovery
from httplib2 import Http
from oauth2client import client, file, tools

store = file.Storage(token_file)
creds = store.get()

if not creds or creds.invalid:
flow = client.flow_from_clientsecrets(
client_secrets_file, GOOGLE_FORMS_SCOPES
)
creds = tools.run_flow(flow, store)

return discovery.build(
"forms",
"v1",
http=creds.authorize(Http()),
discoveryServiceUrl=GOOGLE_FORMS_DISCOVERY_DOC,
static_discovery=False,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

How to use Google Forms API with service account authentication instead of OAuth user flow in Python?

💡 Result:

To use the Google Forms API with service account authentication in Python (avoiding OAuth user flow), follow these steps. Note that Google Forms is a Google Workspace service, so service accounts cannot directly access user-owned forms without additional configuration: either share the form with the service account email (simplest for existing forms) or set up domain-wide delegation (for Google Workspace domains, requires admin access). ## Prerequisites 1. Enable the Google Forms API in your Google Cloud project. 2. Create a service account and download its JSON key file (e.g., service-account.json). 3. Install required libraries: pip install google-api-python-client google-auth google-auth-httplib2 google-auth-oauthlib ## Method 1: Share Form with Service Account (Recommended for Simplicity, No Domain-Wide Delegation Needed) - Open the Google Form in Drive. - Share it with the service account's email (found in the JSON key as client_email, e.g., your-sa@project.iam.gserviceaccount.com) with Viewer/Editor access as needed. ## Authentication and API Usage Code python from google.oauth2 import service_account from googleapiclient.discovery import build # Load service account credentials SCOPES = [ 'https://www.googleapis.com/auth/forms.body', # View/edit form content 'https://www.googleapis.com/auth/forms.responses.readonly', # Read responses # Add 'https://www.googleapis.com/auth/drive' if needed for form creation/sharing ] credentials = service_account.Credentials.from_service_account_file( 'service-account.json', scopes=SCOPES ) # For domain-wide delegation (if set up): impersonate a user # credentials = credentials.with_subject('user@yourdomain.com') # Build the Forms API service service = build('forms', 'v1', credentials=credentials) # Example: Get form details (form must be shared with SA) form_id = 'your_form_id_here' # From form URL: docs.google.com/forms/d/FORM_ID/edit result = service.forms.get(formId=form_id).execute print(result.get('info', {}).get('title')) # Example: List responses responses = service.forms.responses.list(formId=form_id).execute print(responses) # Example: Create a new form (requires Drive scope; form owned by SA) new_form = { 'info': {'title': 'My Service Account Form'} } created_form = service.forms.create(body=new_form).execute print(f'Created form ID: {created_form.get("formId")}') ## Method 2: Domain-Wide Delegation (For Google Workspace, Access Any User Forms) 1. In Google Cloud Console > IAM & Admin > Service Accounts > your SA > Enable "Domain-wide delegation" and note Client ID. 2. Google Workspace Admin Console > Security > API Controls > Manage Domain Wide Delegation > Add your SA's Client ID and Forms scopes (e.g., https://www.googleapis.com/auth/forms.body,https://www.googleapis.com/auth/forms.responses.readonly). 3. In code, add credentials = credentials.with_subject('target-user@domain.com') after loading credentials (impersonate form owner). ## Common Issues & Fixes - 403/401 Errors: Ensure form is shared with SA email or delegation is set up correctly. IAM roles like Owner don't apply to Workspace. - Form Creation: New forms are owned by SA; share them via Drive API if needed. - Scopes: Match to operations (see official reference). - Official quickstarts use OAuth; service accounts require these workarounds per sources. This works as of 2026; test with minimal scopes.

Citations:


Interactive OAuth flow will block or fail in headless server environments.

tools.run_flow() from the deprecated oauth2client library requires browser-based user authorization, which is incompatible with headless deployments (containers, CI, production servers). It will block indefinitely or raise an exception when no browser is available.

Migrate to service account authentication. Use google.oauth2.service_account with Credentials.from_service_account_file() and either:

  • Share the Google Form with the service account's email address (found in the JSON key as client_email), or
  • Enable domain-wide delegation if running on Google Workspace with admin access.

Also replace the deprecated oauth2client library with google-auth and google-auth-httplib2.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 120 - 146, The _build_forms_service function
currently uses oauth2client.tools.run_flow (tools.run_flow) which triggers an
interactive browser OAuth flow unsuitable for headless servers; replace this
with a service account flow using
google.oauth2.service_account.Credentials.from_service_account_file and the
google-auth/google-auth-httplib2 stack: remove the oauth2client/file/tools logic
and token_file usage, load the JSON key with
Credentials.from_service_account_file (or use domain-wide delegation if needed),
create an AuthorizedHttp (or pass the credentials directly to discovery.build
via the credentials parameter), and ensure the Google Form is shared with the
service account email (or configure domain delegation) so
discovery.build("forms", "v1", ...) uses these service-account credentials
instead of tools.run_flow.

Comment thread backend/server.py
Comment on lines +508 to +523
for question, options in zip(input_questions, input_options):
if not options:
outputs.append("")
continue

qa_response = qa_model(question=question, context=input_text)
generated_answer = qa_response["answer"]

vectorizer = TfidfVectorizer().fit_transform([*options, generated_answer])
vectors = vectorizer.toarray()
generated_answer_vector = vectors[-1].reshape(1, -1)
similarities = cosine_similarity(
vectors[:-1], generated_answer_vector
).flatten()
best_option = options[similarities.argmax()]
outputs.append(best_option)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add strict=True to zip() for defensive validation.

Although line 500-503 validates that input_questions and input_options have the same length, using strict=True provides a built-in safeguard and is more idiomatic in Python 3.10+.

Proposed fix
-        for question, options in zip(input_questions, input_options):
+        for question, options in zip(input_questions, input_options, strict=True):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for question, options in zip(input_questions, input_options):
if not options:
outputs.append("")
continue
qa_response = qa_model(question=question, context=input_text)
generated_answer = qa_response["answer"]
vectorizer = TfidfVectorizer().fit_transform([*options, generated_answer])
vectors = vectorizer.toarray()
generated_answer_vector = vectors[-1].reshape(1, -1)
similarities = cosine_similarity(
vectors[:-1], generated_answer_vector
).flatten()
best_option = options[similarities.argmax()]
outputs.append(best_option)
for question, options in zip(input_questions, input_options, strict=True):
if not options:
outputs.append("")
continue
qa_response = qa_model(question=question, context=input_text)
generated_answer = qa_response["answer"]
vectorizer = TfidfVectorizer().fit_transform([*options, generated_answer])
vectors = vectorizer.toarray()
generated_answer_vector = vectors[-1].reshape(1, -1)
similarities = cosine_similarity(
vectors[:-1], generated_answer_vector
).flatten()
best_option = options[similarities.argmax()]
outputs.append(best_option)
🧰 Tools
🪛 Ruff (0.15.7)

[warning] 508-508: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 508 - 523, The loop using
zip(input_questions, input_options) should use zip(..., strict=True) to enforce
equal-length iterables; update the zip call inside the loop (the one iterating
over input_questions and input_options in the block that calls qa_model and
computes cosine_similarity) to zip(input_questions, input_options, strict=True)
so Python raises immediately if lengths differ (ensure runtime uses Python 3.10+
and adjust any tests or callers expecting silent truncation).

Comment thread backend/server.py
if not subtitle_files:
return jsonify({"error": "No subtitles found"}), 404

latest_subtitle = max(subtitle_files, key=os.path.getctime)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use os.path.getmtime for cross-platform consistency.

os.path.getctime returns the inode change time on Unix (not creation time), which includes metadata changes. For finding the most recently downloaded file, os.path.getmtime (modification time) is more reliable across platforms.

Proposed fix
-        latest_subtitle = max(subtitle_files, key=os.path.getctime)
+        latest_subtitle = max(subtitle_files, key=os.path.getmtime)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
latest_subtitle = max(subtitle_files, key=os.path.getctime)
latest_subtitle = max(subtitle_files, key=os.path.getmtime)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` at line 722, Replace os.path.getctime with
os.path.getmtime when computing latest_subtitle from subtitle_files to use file
modification time (cross-platform consistent); update the expression
latest_subtitle = max(subtitle_files, key=os.path.getmtime) in the code that
selects the most recent subtitle file (search around the latest_subtitle and
subtitle_files usage) and ensure os is already imported.

"build:web": "cd ../eduaid_web && npm run build",
"copy:web": "xcopy /E /I /Y \"..\\eduaid_web\\build\" \"build\"",
"build": "cd ../eduaid_web && npm run build",
"copy:web": "node -e \"const fs=require('fs');fs.rmSync('build',{recursive:true,force:true});fs.cpSync('../eduaid_web/build','build',{recursive:true});\"",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for Node.js version requirements in package.json files and any .nvmrc
echo "=== Checking package.json engines field ==="
fd -t f 'package.json' --exec grep -l "engines" {} \; 2>/dev/null | xargs -I{} sh -c 'echo "File: {}"; cat {} | jq ".engines" 2>/dev/null'

echo ""
echo "=== Checking for .nvmrc or .node-version ==="
fd -t f -g '.nvmrc' -g '.node-version'
fd -t f -g '.nvmrc' -g '.node-version' --exec cat {}

echo ""
echo "=== README Node.js requirements ==="
rg -i "node.*version|node.*[0-9]+\.[0-9]+" README.md 2>/dev/null || echo "No README.md or no Node version mentioned"

Repository: AOSSIE-Org/EduAid

Length of output: 439


🏁 Script executed:

cat README.md | head -100

Repository: AOSSIE-Org/EduAid

Length of output: 3896


🏁 Script executed:

cat eduaid_desktop/package.json | head -30

Repository: AOSSIE-Org/EduAid

Length of output: 1113


🏁 Script executed:

rg "node.*16|16\.[0-9]" -i --type md --type json

Repository: AOSSIE-Org/EduAid

Length of output: 50376


Ensure Node.js minimum version requirement is 16.7.0 or higher to support fs.cpSync.

The copy:web script uses fs.cpSync, which was added in Node.js 16.7.0. Update the minimum Node.js version in package.json (add an engines field) and in README.md files from "version 16 or higher" to "16.7.0 or higher" to prevent build failures on versions 16.0.0–16.6.x.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@eduaid_desktop/package.json` at line 11, Add a Node.js engines requirement of
at least 16.7.0 to package.json and update README references to "16.7.0 or
higher": specifically, add an "engines" field in package.json (next to existing
metadata) specifying "node": ">=16.7.0" so the build script that uses fs.cpSync
in the "copy:web" npm script is guaranteed to run, and update any README.md text
that currently says "version 16 or higher" to "16.7.0 or higher" to keep
documentation consistent with the runtime requirement.

Comment on lines 68 to 84
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
if (request.type === "TEXT_SELECTED") {
chrome.storage.local.set({
selectedText: request.text
}, () => {
console.log("Text saved to storage:", request.text);
sendResponse({ status: "success" });
});
return true; // Required for async sendResponse
}
});

//Clear badge when popup is opened
chrome.action.onClicked.addListener(() => {
chrome.action.setBadgeText({ text: "" });
});

chrome.storage.onChanged.addListener((changes, namespace) => {
for (let [key, { oldValue, newValue }] of Object.entries(changes)) {
console.log(
`Storage key "${key}" in namespace "${namespace}" changed.`,
`Old value was "${oldValue}", new value is "${newValue}".`
);

// Store the key-value pair in local storage
chrome.storage.local.set({ [key]: newValue }, () => {
if (chrome.runtime.lastError) {
console.error("Error storing data:", chrome.runtime.lastError);
} else {
console.log(`Stored key-value pair: { ${key}: ${newValue} }`);
}
});
}
if (request.type === "TEXT_SELECTED") {
chrome.storage.local.set({ selectedText: request.text }, () => {
if (chrome.runtime.lastError) {
console.error("Error storing selected text:", chrome.runtime.lastError);
sendResponse({ status: "error" });
return;
}

sendResponse({ status: "success" });
});

return true;
}

return false;
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential issue: sendResponse after async callback may not work reliably.

When chrome.storage.local.set completes, the message channel may already be closed. While return true keeps the channel open, the callback pattern with sendResponse inside the storage callback can be unreliable.

💡 Consider using Promise-based approach
 chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
   if (request.type === "TEXT_SELECTED") {
-    chrome.storage.local.set({ selectedText: request.text }, () => {
-      if (chrome.runtime.lastError) {
-        console.error("Error storing selected text:", chrome.runtime.lastError);
-        sendResponse({ status: "error" });
-        return;
-      }
-
-      sendResponse({ status: "success" });
-    });
-
-    return true;
+    (async () => {
+      try {
+        await chrome.storage.local.set({ selectedText: request.text });
+        sendResponse({ status: "success" });
+      } catch (error) {
+        console.error("Error storing selected text:", error);
+        sendResponse({ status: "error" });
+      }
+    })();
+    return true;
   }
 
   return false;
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/public/background.js` around lines 68 - 84, The current
chrome.runtime.onMessage.addListener handler uses chrome.storage.local.set with
a callback and sendResponse inside that callback, which can be unreliable;
refactor the listener (chrome.runtime.onMessage.addListener) to an async flow:
detect request.type === "TEXT_SELECTED", await a Promise-based storage set (wrap
chrome.storage.local.set in a Promise or use the Promise-based storage API),
then call sendResponse({status: "success"}) or sendResponse({status: "error",
error}) on failure, and still return true from the listener to keep the message
channel open; reference the listener, sendResponse, and chrome.storage.local.set
when making the change.

Comment on lines +91 to 95
if (questionType === "get_mcq" && qaPairsFromStorage["output"]) {
qaPairsFromStorage["output"].forEach((qaPair) => {
combinedQaPairs.push({
question: qaPair.question_statement,
question_type: "MCQ",
options: qaPair.options,
answer: qaPair.answer,
context: qaPair.context,
});
combinedQaPairs.push(normalizeMcqPair(qaPair));
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Logic issue: potential duplicate processing of output data.

When questionType === "get_mcq", this block processes qaPairsFromStorage["output"]. However, the subsequent block at lines 97-115 also processes output when questionType is not "get_mcq". The condition at line 104 (questionType !== "get_mcq") prevents double processing for MCQ, but the boolean check at line 97 (questionType == "get_boolq") uses loose equality and doesn't guard against output being undefined.

💡 Add defensive check for undefined output
-      if (questionType == "get_boolq") {
+      if (questionType === "get_boolq" && qaPairsFromStorage["output"]) {
         qaPairsFromStorage["output"].forEach((qaPair) => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extension/src/pages/question/Question.jsx` around lines 91 - 95, The code may
double-process qaPairsFromStorage["output"] and uses loose equality; update the
conditional logic around questionType and output handling to avoid duplicates:
change any loose check like questionType == "get_boolq" to strict equality
(questionType === "get_boolq"), and add a defensive guard (e.g., check
qaPairsFromStorage["output"] != null) before iterating in the block that handles
boolean or non-MCQ flows so normalizeMcqPair/other processors only run when
output exists and only in the intended branch (ensure the branch that handles
get_mcq uses the same output-exists guard and the other branch skips output when
questionType === "get_mcq").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant