Refactor/system reliability#652
Conversation
…ions, and resolve build warnings
📝 WalkthroughWalkthroughThe PR refactors the Flask backend into a factory pattern with lazy-loaded services and centralized request validation. Frontend components normalize MCQ data structures and update form generation. Build configurations and dependencies are updated across packages, with stricter type handling throughout. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
eduaid_web/src/pages/Output.jsx (1)
151-157:⚠️ Potential issue | 🟡 MinorAdd defensive check for
outputbefore iteration.When
questionType === "get_boolq", the code accessesqaPairsFromStorage["output"].forEach(...)without checking ifoutputexists. Ifoutputisundefined, this will throw a runtime error.🐛 Proposed fix
- if (questionType === "get_boolq") { + if (questionType === "get_boolq" && qaPairsFromStorage["output"]) { qaPairsFromStorage["output"].forEach((qaPair) => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eduaid_web/src/pages/Output.jsx` around lines 151 - 157, Check that qaPairsFromStorage["output"] exists and is an array before calling forEach when questionType === "get_boolq"; replace the direct access with a defensive guard (e.g., verify Array.isArray(qaPairsFromStorage.output) or use optional chaining) and only push into combinedQaPairs when output is present to avoid runtime errors in Output.jsx where qaPairsFromStorage and the "output" key are used.
🧹 Nitpick comments (8)
eduaid_web/src/pages/Text_Input.jsx (1)
191-193: Enforce the advertised upload formats in the file picker.The UI says
PDF, DOCX, TXTare supported, but the file input still allows any file type. Add anacceptfilter to reduce invalid uploads at source.Suggested patch
- <input type="file" ref={fileInputRef} onChange={handleFileUpload} style={{ display: "none" }} /> + <input + type="file" + accept=".pdf,.docx,.txt,application/pdf,application/vnd.openxmlformats-officedocument.wordprocessingml.document,text/plain" + ref={fileInputRef} + onChange={handleFileUpload} + style={{ display: "none" }} + />🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eduaid_web/src/pages/Text_Input.jsx` around lines 191 - 193, Update the hidden file input in Text_Input.jsx to enforce the advertised formats by adding an accept attribute (e.g. accept=".pdf,.docx,.txt,application/pdf,application/vnd.openxmlformats-officedocument.wordprocessingml.document,text/plain") on the <input> that uses fileInputRef and handleFileUpload so the OS file picker filters to PDF, DOCX and TXT; keep the existing handleFileUpload validation in place for server-side safety.eduaid_web/src/pages/Output.jsx (1)
90-108: Consider extractingnormalizeMcqPairto a shared utility.This function is duplicated verbatim in
extension/src/pages/question/Question.jsx. Extracting it to a shared utility would reduce maintenance burden and ensure consistent behavior.A shared location like
eduaid_web/src/utils/questionUtils.js(and similarly for the extension) would allow both components to import the same normalization logic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@eduaid_web/src/pages/Output.jsx` around lines 90 - 108, The normalizeMcqPair function is duplicated; extract it into a shared utility (e.g., create a new function normalizeMcqPair in eduaid_web/src/utils/questionUtils.js) and replace the local implementations in Output.jsx and extension/src/pages/question/Question.jsx to import and call that exported normalizeMcqPair; ensure the exported function preserves the same signature and behavior (handles qaPair.answer as array or scalar, derives question/question_statement, sets question_type, options, answer, and context) and update imports in both files accordingly.extension/src/pages/question/SidePanel.jsx (1)
116-122: Consider adding error handling for missing form URLs.The fallback from
edit_linktoform_linkis correct and consistent with other components. However, if both are missing/undefined,window.open(undefined, "_blank")will open a blank tab.💡 Optional: Add defensive check
if (response.ok) { const result = await response.json(); const formUrl = result.edit_link || result.form_link; + if (!formUrl) { + console.error("No form URL returned from server"); + return; + } window.open(formUrl, "_blank"); } else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extension/src/pages/question/SidePanel.jsx` around lines 116 - 122, The response handling currently falls back to result.edit_link || result.form_link then calls window.open(formUrl, "_blank") without ensuring formUrl exists; change the logic in the SidePanel.jsx response.ok block to verify that formUrl (from result.edit_link or result.form_link) is truthy before calling window.open(formUrl, "_blank"), and when it's missing call console.error (or show a user-facing message) and skip opening a blank tab; update the code around the response.ok branch and the window.open invocation to perform this defensive check.extension/public/background.js (1)
27-42: Redundant storage write: text is stored twice.Line 29 stores
selectedTextinchrome.storage.local, and the content script atcontentScript.js:9also stores the same text to the same key when it receives the message (per context snippet 2). This redundancy is harmless but unnecessary.Consider removing one of the storage writes to simplify the flow. Since the background script stores first (line 29), the content script's
generateQuestionsfunction could skip its storage call, or vice versa.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extension/public/background.js` around lines 27 - 42, Remove the redundant write by eliminating the duplicate chrome.storage.local.set call inside the content script's generateQuestions function (so the background context menu handler remains the single writer when the menu is used); update generateQuestions in contentScript.js to rely on the selectedText from the incoming message (chrome.runtime.onMessage handler) instead of storing it again or, if you prefer to keep content script storage, remove the chrome.storage.local.set call in the background context menu handler that currently runs when info.menuItemId === "askExtension". Ensure whichever write you remove still leaves selectedText available to downstream logic (message payload or storage) and keep the error handling and badge-setting logic in the background handler unchanged.backend/test_server.py (1)
54-56: Stub behavior may not scale for larger inputs.
StubAnswerPredictor.predict_boolean_answerreturns[True, False][:len(payload["input_question"])], which caps at 2 answers regardless of input size. While the current test uses exactly 2 questions, this could cause test failures if tests are extended to larger inputs.💡 Generate answers matching input length
class StubAnswerPredictor: def predict_boolean_answer(self, payload): - return [True, False][: len(payload["input_question"])] + count = len(payload["input_question"]) + return [i % 2 == 0 for i in range(count)]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/test_server.py` around lines 54 - 56, The stub in StubAnswerPredictor.predict_boolean_answer currently slices a two-element list so it never returns more than two answers; change it to generate a boolean list whose length matches len(payload["input_question"]) (e.g., repeat or compute a pattern and then slice to n) so the returned list scales with the number of questions and preserves the alternating True/False behavior for arbitrary n.backend/server.py (3)
587-590: Consider wrapping Google Forms API calls in try-except for clearer error messages.If
forms().create()orbatchUpdate()fails (quota exceeded, network error, invalid credentials), the exception will propagate as a 500 error. Catching and re-raising asServiceUnavailablewith a user-friendly message would improve debuggability.Proposed improvement
+ try: result = form_service.forms().create(body={"info": {"title": "EduAid form"}}).execute() form_service.forms().batchUpdate( formId=result["formId"], body={"requests": requests_list} ).execute() + except Exception as exc: + raise ServiceUnavailable(f"Failed to create Google Form: {exc}") from exc🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 587 - 590, Wrap the Google Forms calls in a try-except: surround form_service.forms().create(...) and form_service.forms().batchUpdate(formId=result["formId"], body={"requests": requests_list}).execute() with a try block, catch specific API exceptions (e.g., googleapiclient.errors.HttpError) and a general Exception fallback, log the original exception details and then raise ServiceUnavailable with a clear, user-friendly message that includes context (e.g., "Failed to create or update Google Form") while retaining the original error in the logs for debugging; ensure you reference the result variable and requests_list when composing the message for clarity.
692-716: Subprocess call is adequately protected by input validation.Static analysis flags this as potentially unsafe, but the
video_idis validated against[A-Za-z0-9_-]{6,20}on line 683, which prevents shell injection. The partial executable path (yt-dlp) is acceptable for system tools typically installed in standard locations.One improvement: consider adding a timeout to prevent hanging on slow network conditions.
Optional: Add timeout
subprocess.run( [ "yt-dlp", "--write-auto-sub", "--sub-lang", "en", "--skip-download", "--sub-format", "vtt", "-o", output_template, youtube_url, ], check=True, capture_output=True, text=True, + timeout=60, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 692 - 716, The subprocess.run invocation that calls yt-dlp (in the block using subprocess.run with arguments including "yt-dlp", output_template and youtube_url) should include a timeout to avoid hanging on slow networks; add a sensible timeout param (e.g. timeout=30) to that subprocess.run call and add an except subprocess.TimeoutExpired handler that logs the timeout (use current_app.logger.exception or logger) referencing video_id and returns a 504/ServiceUnavailable JSON response indicating the fetch timed out, while preserving the existing FileNotFoundError and CalledProcessError handling.
194-199: Consider documenting the asymmetricNonehandling.
_require_listreturns an empty list forNonebut raisesInvalidRequestfor non-list types. This behavior is reasonable (treating missing fields as empty lists) but could surprise callers expecting strict validation for all non-list inputs. A brief docstring would clarify intent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 194 - 199, The helper function _require_list currently treats None as an empty list but raises InvalidRequest for any non-list value; add a brief docstring to _require_list that explains this asymmetric None handling (i.e., missing/None is normalized to [], while non-list inputs are invalid), mention the rationale (treat missing fields as empty lists) and that callers should expect InvalidRequest for wrong types, and ensure the docstring references the function name _require_list and the exception InvalidRequest so future readers understand the intended behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Line 722: Replace os.path.getctime with os.path.getmtime when computing
latest_subtitle from subtitle_files to use file modification time
(cross-platform consistent); update the expression latest_subtitle =
max(subtitle_files, key=os.path.getmtime) in the code that selects the most
recent subtitle file (search around the latest_subtitle and subtitle_files
usage) and ensure os is already imported.
- Around line 508-523: The loop using zip(input_questions, input_options) should
use zip(..., strict=True) to enforce equal-length iterables; update the zip call
inside the loop (the one iterating over input_questions and input_options in the
block that calls qa_model and computes cosine_similarity) to
zip(input_questions, input_options, strict=True) so Python raises immediately if
lengths differ (ensure runtime uses Python 3.10+ and adjust any tests or callers
expecting silent truncation).
- Around line 120-146: The _build_forms_service function currently uses
oauth2client.tools.run_flow (tools.run_flow) which triggers an interactive
browser OAuth flow unsuitable for headless servers; replace this with a service
account flow using
google.oauth2.service_account.Credentials.from_service_account_file and the
google-auth/google-auth-httplib2 stack: remove the oauth2client/file/tools logic
and token_file usage, load the JSON key with
Credentials.from_service_account_file (or use domain-wide delegation if needed),
create an AuthorizedHttp (or pass the credentials directly to discovery.build
via the credentials parameter), and ensure the Google Form is shared with the
service account email (or configure domain delegation) so
discovery.build("forms", "v1", ...) uses these service-account credentials
instead of tools.run_flow.
In `@eduaid_desktop/package.json`:
- Line 11: Add a Node.js engines requirement of at least 16.7.0 to package.json
and update README references to "16.7.0 or higher": specifically, add an
"engines" field in package.json (next to existing metadata) specifying "node":
">=16.7.0" so the build script that uses fs.cpSync in the "copy:web" npm script
is guaranteed to run, and update any README.md text that currently says "version
16 or higher" to "16.7.0 or higher" to keep documentation consistent with the
runtime requirement.
In `@extension/public/background.js`:
- Around line 68-84: The current chrome.runtime.onMessage.addListener handler
uses chrome.storage.local.set with a callback and sendResponse inside that
callback, which can be unreliable; refactor the listener
(chrome.runtime.onMessage.addListener) to an async flow: detect request.type ===
"TEXT_SELECTED", await a Promise-based storage set (wrap
chrome.storage.local.set in a Promise or use the Promise-based storage API),
then call sendResponse({status: "success"}) or sendResponse({status: "error",
error}) on failure, and still return true from the listener to keep the message
channel open; reference the listener, sendResponse, and chrome.storage.local.set
when making the change.
In `@extension/src/pages/question/Question.jsx`:
- Around line 91-95: The code may double-process qaPairsFromStorage["output"]
and uses loose equality; update the conditional logic around questionType and
output handling to avoid duplicates: change any loose check like questionType ==
"get_boolq" to strict equality (questionType === "get_boolq"), and add a
defensive guard (e.g., check qaPairsFromStorage["output"] != null) before
iterating in the block that handles boolean or non-MCQ flows so
normalizeMcqPair/other processors only run when output exists and only in the
intended branch (ensure the branch that handles get_mcq uses the same
output-exists guard and the other branch skips output when questionType ===
"get_mcq").
---
Outside diff comments:
In `@eduaid_web/src/pages/Output.jsx`:
- Around line 151-157: Check that qaPairsFromStorage["output"] exists and is an
array before calling forEach when questionType === "get_boolq"; replace the
direct access with a defensive guard (e.g., verify
Array.isArray(qaPairsFromStorage.output) or use optional chaining) and only push
into combinedQaPairs when output is present to avoid runtime errors in
Output.jsx where qaPairsFromStorage and the "output" key are used.
---
Nitpick comments:
In `@backend/server.py`:
- Around line 587-590: Wrap the Google Forms calls in a try-except: surround
form_service.forms().create(...) and
form_service.forms().batchUpdate(formId=result["formId"], body={"requests":
requests_list}).execute() with a try block, catch specific API exceptions (e.g.,
googleapiclient.errors.HttpError) and a general Exception fallback, log the
original exception details and then raise ServiceUnavailable with a clear,
user-friendly message that includes context (e.g., "Failed to create or update
Google Form") while retaining the original error in the logs for debugging;
ensure you reference the result variable and requests_list when composing the
message for clarity.
- Around line 692-716: The subprocess.run invocation that calls yt-dlp (in the
block using subprocess.run with arguments including "yt-dlp", output_template
and youtube_url) should include a timeout to avoid hanging on slow networks; add
a sensible timeout param (e.g. timeout=30) to that subprocess.run call and add
an except subprocess.TimeoutExpired handler that logs the timeout (use
current_app.logger.exception or logger) referencing video_id and returns a
504/ServiceUnavailable JSON response indicating the fetch timed out, while
preserving the existing FileNotFoundError and CalledProcessError handling.
- Around line 194-199: The helper function _require_list currently treats None
as an empty list but raises InvalidRequest for any non-list value; add a brief
docstring to _require_list that explains this asymmetric None handling (i.e.,
missing/None is normalized to [], while non-list inputs are invalid), mention
the rationale (treat missing fields as empty lists) and that callers should
expect InvalidRequest for wrong types, and ensure the docstring references the
function name _require_list and the exception InvalidRequest so future readers
understand the intended behavior.
In `@backend/test_server.py`:
- Around line 54-56: The stub in StubAnswerPredictor.predict_boolean_answer
currently slices a two-element list so it never returns more than two answers;
change it to generate a boolean list whose length matches
len(payload["input_question"]) (e.g., repeat or compute a pattern and then slice
to n) so the returned list scales with the number of questions and preserves the
alternating True/False behavior for arbitrary n.
In `@eduaid_web/src/pages/Output.jsx`:
- Around line 90-108: The normalizeMcqPair function is duplicated; extract it
into a shared utility (e.g., create a new function normalizeMcqPair in
eduaid_web/src/utils/questionUtils.js) and replace the local implementations in
Output.jsx and extension/src/pages/question/Question.jsx to import and call that
exported normalizeMcqPair; ensure the exported function preserves the same
signature and behavior (handles qaPair.answer as array or scalar, derives
question/question_statement, sets question_type, options, answer, and context)
and update imports in both files accordingly.
In `@eduaid_web/src/pages/Text_Input.jsx`:
- Around line 191-193: Update the hidden file input in Text_Input.jsx to enforce
the advertised formats by adding an accept attribute (e.g.
accept=".pdf,.docx,.txt,application/pdf,application/vnd.openxmlformats-officedocument.wordprocessingml.document,text/plain")
on the <input> that uses fileInputRef and handleFileUpload so the OS file picker
filters to PDF, DOCX and TXT; keep the existing handleFileUpload validation in
place for server-side safety.
In `@extension/public/background.js`:
- Around line 27-42: Remove the redundant write by eliminating the duplicate
chrome.storage.local.set call inside the content script's generateQuestions
function (so the background context menu handler remains the single writer when
the menu is used); update generateQuestions in contentScript.js to rely on the
selectedText from the incoming message (chrome.runtime.onMessage handler)
instead of storing it again or, if you prefer to keep content script storage,
remove the chrome.storage.local.set call in the background context menu handler
that currently runs when info.menuItemId === "askExtension". Ensure whichever
write you remove still leaves selectedText available to downstream logic
(message payload or storage) and keep the error handling and badge-setting logic
in the background handler unchanged.
In `@extension/src/pages/question/SidePanel.jsx`:
- Around line 116-122: The response handling currently falls back to
result.edit_link || result.form_link then calls window.open(formUrl, "_blank")
without ensuring formUrl exists; change the logic in the SidePanel.jsx
response.ok block to verify that formUrl (from result.edit_link or
result.form_link) is truthy before calling window.open(formUrl, "_blank"), and
when it's missing call console.error (or show a user-facing message) and skip
opening a blank tab; update the code around the response.ok branch and the
window.open invocation to perform this defensive check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b592fa9b-96a9-404c-a4ec-57f0e4d67a3e
⛔ Files ignored due to path filters (2)
eduaid_web/package-lock.jsonis excluded by!**/package-lock.jsonextension/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (17)
backend/server.pybackend/test_server.pyeduaid_desktop/config.jseduaid_desktop/package.jsoneduaid_web/package.jsoneduaid_web/src/App.jseduaid_web/src/pages/Output.jsxeduaid_web/src/pages/PageNotFound.jsxeduaid_web/src/pages/Text_Input.jsxeduaid_web/src/utils/apiClient.jseduaid_web/src/workers/pdfWorker.jsextension/package.jsonextension/public/background.jsextension/public/manifest.jsonextension/src/pages/question/Question.jsxextension/src/pages/question/SidePanel.jsxextension/src/pages/text_input/TextInput.jsx
| def _build_forms_service(self): | ||
| client_secrets_file = current_app.config["CLIENT_SECRETS_FILE"] | ||
| token_file = current_app.config["FORMS_TOKEN_FILE"] | ||
|
|
||
| if not os.path.exists(client_secrets_file): | ||
| return None | ||
|
|
||
| from apiclient import discovery | ||
| from httplib2 import Http | ||
| from oauth2client import client, file, tools | ||
|
|
||
| store = file.Storage(token_file) | ||
| creds = store.get() | ||
|
|
||
| if not creds or creds.invalid: | ||
| flow = client.flow_from_clientsecrets( | ||
| client_secrets_file, GOOGLE_FORMS_SCOPES | ||
| ) | ||
| creds = tools.run_flow(flow, store) | ||
|
|
||
| return discovery.build( | ||
| "forms", | ||
| "v1", | ||
| http=creds.authorize(Http()), | ||
| discoveryServiceUrl=GOOGLE_FORMS_DISCOVERY_DOC, | ||
| static_discovery=False, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
How to use Google Forms API with service account authentication instead of OAuth user flow in Python?
💡 Result:
To use the Google Forms API with service account authentication in Python (avoiding OAuth user flow), follow these steps. Note that Google Forms is a Google Workspace service, so service accounts cannot directly access user-owned forms without additional configuration: either share the form with the service account email (simplest for existing forms) or set up domain-wide delegation (for Google Workspace domains, requires admin access). ## Prerequisites 1. Enable the Google Forms API in your Google Cloud project. 2. Create a service account and download its JSON key file (e.g., service-account.json). 3. Install required libraries: pip install google-api-python-client google-auth google-auth-httplib2 google-auth-oauthlib ## Method 1: Share Form with Service Account (Recommended for Simplicity, No Domain-Wide Delegation Needed) - Open the Google Form in Drive. - Share it with the service account's email (found in the JSON key as client_email, e.g., your-sa@project.iam.gserviceaccount.com) with Viewer/Editor access as needed. ## Authentication and API Usage Code python from google.oauth2 import service_account from googleapiclient.discovery import build # Load service account credentials SCOPES = [ 'https://www.googleapis.com/auth/forms.body', # View/edit form content 'https://www.googleapis.com/auth/forms.responses.readonly', # Read responses # Add 'https://www.googleapis.com/auth/drive' if needed for form creation/sharing ] credentials = service_account.Credentials.from_service_account_file( 'service-account.json', scopes=SCOPES ) # For domain-wide delegation (if set up): impersonate a user # credentials = credentials.with_subject('user@yourdomain.com') # Build the Forms API service service = build('forms', 'v1', credentials=credentials) # Example: Get form details (form must be shared with SA) form_id = 'your_form_id_here' # From form URL: docs.google.com/forms/d/FORM_ID/edit result = service.forms.get(formId=form_id).execute print(result.get('info', {}).get('title')) # Example: List responses responses = service.forms.responses.list(formId=form_id).execute print(responses) # Example: Create a new form (requires Drive scope; form owned by SA) new_form = { 'info': {'title': 'My Service Account Form'} } created_form = service.forms.create(body=new_form).execute print(f'Created form ID: {created_form.get("formId")}') ## Method 2: Domain-Wide Delegation (For Google Workspace, Access Any User Forms) 1. In Google Cloud Console > IAM & Admin > Service Accounts > your SA > Enable "Domain-wide delegation" and note Client ID. 2. Google Workspace Admin Console > Security > API Controls > Manage Domain Wide Delegation > Add your SA's Client ID and Forms scopes (e.g., https://www.googleapis.com/auth/forms.body,https://www.googleapis.com/auth/forms.responses.readonly). 3. In code, add credentials = credentials.with_subject('target-user@domain.com') after loading credentials (impersonate form owner). ## Common Issues & Fixes - 403/401 Errors: Ensure form is shared with SA email or delegation is set up correctly. IAM roles like Owner don't apply to Workspace. - Form Creation: New forms are owned by SA; share them via Drive API if needed. - Scopes: Match to operations (see official reference). - Official quickstarts use OAuth; service accounts require these workarounds per sources. This works as of 2026; test with minimal scopes.
Citations:
- 1: https://developers.google.com/workspace/forms/api/guides/retrieve-forms-responses
- 2: https://developers.google.com/workspace/forms/api/guides/create-form-quiz
- 3: https://developers.google.com/workspace/forms/api/reference/rest
- 4: https://developers.google.com/workspace/forms/api/quickstart/python
- 5: https://stackoverflow.com/questions/79197024/google-forms-api-raises-google-auth-exceptions-refresherror-no-access-token-in
- 6: https://developers.google.com/identity/protocols/oauth2/service-account
- 7: https://stackoverflow.com/questions/75867323/request-had-insufficient-authentication-scope-when-trying-to-create-google-for
- 8: https://endgrate.com/blog/using-the-google-forms-api-to-get-questions-in-python
- 9: https://developers.google.com/workspace/forms/api/reference/rest/v1/forms/create
- 10: https://developers.google.com/workspace/forms/api/reference/rest/v1/forms/get
Interactive OAuth flow will block or fail in headless server environments.
tools.run_flow() from the deprecated oauth2client library requires browser-based user authorization, which is incompatible with headless deployments (containers, CI, production servers). It will block indefinitely or raise an exception when no browser is available.
Migrate to service account authentication. Use google.oauth2.service_account with Credentials.from_service_account_file() and either:
- Share the Google Form with the service account's email address (found in the JSON key as
client_email), or - Enable domain-wide delegation if running on Google Workspace with admin access.
Also replace the deprecated oauth2client library with google-auth and google-auth-httplib2.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 120 - 146, The _build_forms_service function
currently uses oauth2client.tools.run_flow (tools.run_flow) which triggers an
interactive browser OAuth flow unsuitable for headless servers; replace this
with a service account flow using
google.oauth2.service_account.Credentials.from_service_account_file and the
google-auth/google-auth-httplib2 stack: remove the oauth2client/file/tools logic
and token_file usage, load the JSON key with
Credentials.from_service_account_file (or use domain-wide delegation if needed),
create an AuthorizedHttp (or pass the credentials directly to discovery.build
via the credentials parameter), and ensure the Google Form is shared with the
service account email (or configure domain delegation) so
discovery.build("forms", "v1", ...) uses these service-account credentials
instead of tools.run_flow.
| for question, options in zip(input_questions, input_options): | ||
| if not options: | ||
| outputs.append("") | ||
| continue | ||
|
|
||
| qa_response = qa_model(question=question, context=input_text) | ||
| generated_answer = qa_response["answer"] | ||
|
|
||
| vectorizer = TfidfVectorizer().fit_transform([*options, generated_answer]) | ||
| vectors = vectorizer.toarray() | ||
| generated_answer_vector = vectors[-1].reshape(1, -1) | ||
| similarities = cosine_similarity( | ||
| vectors[:-1], generated_answer_vector | ||
| ).flatten() | ||
| best_option = options[similarities.argmax()] | ||
| outputs.append(best_option) |
There was a problem hiding this comment.
Add strict=True to zip() for defensive validation.
Although line 500-503 validates that input_questions and input_options have the same length, using strict=True provides a built-in safeguard and is more idiomatic in Python 3.10+.
Proposed fix
- for question, options in zip(input_questions, input_options):
+ for question, options in zip(input_questions, input_options, strict=True):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for question, options in zip(input_questions, input_options): | |
| if not options: | |
| outputs.append("") | |
| continue | |
| qa_response = qa_model(question=question, context=input_text) | |
| generated_answer = qa_response["answer"] | |
| vectorizer = TfidfVectorizer().fit_transform([*options, generated_answer]) | |
| vectors = vectorizer.toarray() | |
| generated_answer_vector = vectors[-1].reshape(1, -1) | |
| similarities = cosine_similarity( | |
| vectors[:-1], generated_answer_vector | |
| ).flatten() | |
| best_option = options[similarities.argmax()] | |
| outputs.append(best_option) | |
| for question, options in zip(input_questions, input_options, strict=True): | |
| if not options: | |
| outputs.append("") | |
| continue | |
| qa_response = qa_model(question=question, context=input_text) | |
| generated_answer = qa_response["answer"] | |
| vectorizer = TfidfVectorizer().fit_transform([*options, generated_answer]) | |
| vectors = vectorizer.toarray() | |
| generated_answer_vector = vectors[-1].reshape(1, -1) | |
| similarities = cosine_similarity( | |
| vectors[:-1], generated_answer_vector | |
| ).flatten() | |
| best_option = options[similarities.argmax()] | |
| outputs.append(best_option) |
🧰 Tools
🪛 Ruff (0.15.7)
[warning] 508-508: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 508 - 523, The loop using
zip(input_questions, input_options) should use zip(..., strict=True) to enforce
equal-length iterables; update the zip call inside the loop (the one iterating
over input_questions and input_options in the block that calls qa_model and
computes cosine_similarity) to zip(input_questions, input_options, strict=True)
so Python raises immediately if lengths differ (ensure runtime uses Python 3.10+
and adjust any tests or callers expecting silent truncation).
| if not subtitle_files: | ||
| return jsonify({"error": "No subtitles found"}), 404 | ||
|
|
||
| latest_subtitle = max(subtitle_files, key=os.path.getctime) |
There was a problem hiding this comment.
Use os.path.getmtime for cross-platform consistency.
os.path.getctime returns the inode change time on Unix (not creation time), which includes metadata changes. For finding the most recently downloaded file, os.path.getmtime (modification time) is more reliable across platforms.
Proposed fix
- latest_subtitle = max(subtitle_files, key=os.path.getctime)
+ latest_subtitle = max(subtitle_files, key=os.path.getmtime)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| latest_subtitle = max(subtitle_files, key=os.path.getctime) | |
| latest_subtitle = max(subtitle_files, key=os.path.getmtime) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` at line 722, Replace os.path.getctime with
os.path.getmtime when computing latest_subtitle from subtitle_files to use file
modification time (cross-platform consistent); update the expression
latest_subtitle = max(subtitle_files, key=os.path.getmtime) in the code that
selects the most recent subtitle file (search around the latest_subtitle and
subtitle_files usage) and ensure os is already imported.
| "build:web": "cd ../eduaid_web && npm run build", | ||
| "copy:web": "xcopy /E /I /Y \"..\\eduaid_web\\build\" \"build\"", | ||
| "build": "cd ../eduaid_web && npm run build", | ||
| "copy:web": "node -e \"const fs=require('fs');fs.rmSync('build',{recursive:true,force:true});fs.cpSync('../eduaid_web/build','build',{recursive:true});\"", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for Node.js version requirements in package.json files and any .nvmrc
echo "=== Checking package.json engines field ==="
fd -t f 'package.json' --exec grep -l "engines" {} \; 2>/dev/null | xargs -I{} sh -c 'echo "File: {}"; cat {} | jq ".engines" 2>/dev/null'
echo ""
echo "=== Checking for .nvmrc or .node-version ==="
fd -t f -g '.nvmrc' -g '.node-version'
fd -t f -g '.nvmrc' -g '.node-version' --exec cat {}
echo ""
echo "=== README Node.js requirements ==="
rg -i "node.*version|node.*[0-9]+\.[0-9]+" README.md 2>/dev/null || echo "No README.md or no Node version mentioned"Repository: AOSSIE-Org/EduAid
Length of output: 439
🏁 Script executed:
cat README.md | head -100Repository: AOSSIE-Org/EduAid
Length of output: 3896
🏁 Script executed:
cat eduaid_desktop/package.json | head -30Repository: AOSSIE-Org/EduAid
Length of output: 1113
🏁 Script executed:
rg "node.*16|16\.[0-9]" -i --type md --type jsonRepository: AOSSIE-Org/EduAid
Length of output: 50376
Ensure Node.js minimum version requirement is 16.7.0 or higher to support fs.cpSync.
The copy:web script uses fs.cpSync, which was added in Node.js 16.7.0. Update the minimum Node.js version in package.json (add an engines field) and in README.md files from "version 16 or higher" to "16.7.0 or higher" to prevent build failures on versions 16.0.0–16.6.x.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@eduaid_desktop/package.json` at line 11, Add a Node.js engines requirement of
at least 16.7.0 to package.json and update README references to "16.7.0 or
higher": specifically, add an "engines" field in package.json (next to existing
metadata) specifying "node": ">=16.7.0" so the build script that uses fs.cpSync
in the "copy:web" npm script is guaranteed to run, and update any README.md text
that currently says "version 16 or higher" to "16.7.0 or higher" to keep
documentation consistent with the runtime requirement.
| chrome.runtime.onMessage.addListener((request, sender, sendResponse) => { | ||
| if (request.type === "TEXT_SELECTED") { | ||
| chrome.storage.local.set({ | ||
| selectedText: request.text | ||
| }, () => { | ||
| console.log("Text saved to storage:", request.text); | ||
| sendResponse({ status: "success" }); | ||
| }); | ||
| return true; // Required for async sendResponse | ||
| } | ||
| }); | ||
|
|
||
| //Clear badge when popup is opened | ||
| chrome.action.onClicked.addListener(() => { | ||
| chrome.action.setBadgeText({ text: "" }); | ||
| }); | ||
|
|
||
| chrome.storage.onChanged.addListener((changes, namespace) => { | ||
| for (let [key, { oldValue, newValue }] of Object.entries(changes)) { | ||
| console.log( | ||
| `Storage key "${key}" in namespace "${namespace}" changed.`, | ||
| `Old value was "${oldValue}", new value is "${newValue}".` | ||
| ); | ||
|
|
||
| // Store the key-value pair in local storage | ||
| chrome.storage.local.set({ [key]: newValue }, () => { | ||
| if (chrome.runtime.lastError) { | ||
| console.error("Error storing data:", chrome.runtime.lastError); | ||
| } else { | ||
| console.log(`Stored key-value pair: { ${key}: ${newValue} }`); | ||
| } | ||
| }); | ||
| } | ||
| if (request.type === "TEXT_SELECTED") { | ||
| chrome.storage.local.set({ selectedText: request.text }, () => { | ||
| if (chrome.runtime.lastError) { | ||
| console.error("Error storing selected text:", chrome.runtime.lastError); | ||
| sendResponse({ status: "error" }); | ||
| return; | ||
| } | ||
|
|
||
| sendResponse({ status: "success" }); | ||
| }); | ||
|
|
||
| return true; | ||
| } | ||
|
|
||
| return false; | ||
| }); |
There was a problem hiding this comment.
Potential issue: sendResponse after async callback may not work reliably.
When chrome.storage.local.set completes, the message channel may already be closed. While return true keeps the channel open, the callback pattern with sendResponse inside the storage callback can be unreliable.
💡 Consider using Promise-based approach
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
if (request.type === "TEXT_SELECTED") {
- chrome.storage.local.set({ selectedText: request.text }, () => {
- if (chrome.runtime.lastError) {
- console.error("Error storing selected text:", chrome.runtime.lastError);
- sendResponse({ status: "error" });
- return;
- }
-
- sendResponse({ status: "success" });
- });
-
- return true;
+ (async () => {
+ try {
+ await chrome.storage.local.set({ selectedText: request.text });
+ sendResponse({ status: "success" });
+ } catch (error) {
+ console.error("Error storing selected text:", error);
+ sendResponse({ status: "error" });
+ }
+ })();
+ return true;
}
return false;
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@extension/public/background.js` around lines 68 - 84, The current
chrome.runtime.onMessage.addListener handler uses chrome.storage.local.set with
a callback and sendResponse inside that callback, which can be unreliable;
refactor the listener (chrome.runtime.onMessage.addListener) to an async flow:
detect request.type === "TEXT_SELECTED", await a Promise-based storage set (wrap
chrome.storage.local.set in a Promise or use the Promise-based storage API),
then call sendResponse({status: "success"}) or sendResponse({status: "error",
error}) on failure, and still return true from the listener to keep the message
channel open; reference the listener, sendResponse, and chrome.storage.local.set
when making the change.
| if (questionType === "get_mcq" && qaPairsFromStorage["output"]) { | ||
| qaPairsFromStorage["output"].forEach((qaPair) => { | ||
| combinedQaPairs.push({ | ||
| question: qaPair.question_statement, | ||
| question_type: "MCQ", | ||
| options: qaPair.options, | ||
| answer: qaPair.answer, | ||
| context: qaPair.context, | ||
| }); | ||
| combinedQaPairs.push(normalizeMcqPair(qaPair)); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Logic issue: potential duplicate processing of output data.
When questionType === "get_mcq", this block processes qaPairsFromStorage["output"]. However, the subsequent block at lines 97-115 also processes output when questionType is not "get_mcq". The condition at line 104 (questionType !== "get_mcq") prevents double processing for MCQ, but the boolean check at line 97 (questionType == "get_boolq") uses loose equality and doesn't guard against output being undefined.
💡 Add defensive check for undefined output
- if (questionType == "get_boolq") {
+ if (questionType === "get_boolq" && qaPairsFromStorage["output"]) {
qaPairsFromStorage["output"].forEach((qaPair) => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@extension/src/pages/question/Question.jsx` around lines 91 - 95, The code may
double-process qaPairsFromStorage["output"] and uses loose equality; update the
conditional logic around questionType and output handling to avoid duplicates:
change any loose check like questionType == "get_boolq" to strict equality
(questionType === "get_boolq"), and add a defensive guard (e.g., check
qaPairsFromStorage["output"] != null) before iterating in the block that handles
boolean or non-MCQ flows so normalizeMcqPair/other processors only run when
output exists and only in the intended branch (ensure the branch that handles
get_mcq uses the same output-exists guard and the other branch skips output when
questionType === "get_mcq").
Summary
This PR improves EduAid's reliability and maintainability across the backend, web app, browser extension, and desktop app.
Main improvements in this PR:
/healthendpoint, safer file upload handling, and safer transcript-fetching behavior.max_questions.Testing
I tested these changes locally with:
python3 -m py_compile backend/server.py backend/test_server.pypython3 backend/test_server.pynpm run buildineduaid_webnpm run buildinextensionnpm run buildineduaid_desktopAddressed Issues:
N/A - Proactive architectural and reliability audit in preparation for GSoC 2026.
Screenshots/Recordings:
N/A for most of this PR because the main changes are reliability, API, build, and integration improvements.
Additional Notes:
caniuse-litewarning from the tooling stack in some environments, but it is not introduced by this PR and does not block successful builds.AI Usage Disclosure:
I have used the following AI models and tools: OpenAI, Gemini for refactoring assistance, test scaffolding, and review support. I reviewed, edited, and tested all changes locally before submission.
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Chores