Skip to content

[NSoC'26] Add Bulk Insert Endpoint to Improve Data API Performance#134

Open
Priyanka0205-CSE wants to merge 4 commits intogeturbackend:mainfrom
Priyanka0205-CSE:main
Open

[NSoC'26] Add Bulk Insert Endpoint to Improve Data API Performance#134
Priyanka0205-CSE wants to merge 4 commits intogeturbackend:mainfrom
Priyanka0205-CSE:main

Conversation

@Priyanka0205-CSE
Copy link
Copy Markdown

@Priyanka0205-CSE Priyanka0205-CSE commented Apr 24, 2026

What’s this PR about?

This PR introduces a new bulk insert endpoint that allows multiple records to be added in a single request. The goal is to improve performance by reducing the number of API calls required for large data operations.


What I implemented

  • Added a new endpoint:
    /api/data/:collectionName/bulk

  • Implemented bulkInsertData using MongoDB’s insertMany for efficient insertion

  • Added validation to ensure the request body is an array of valid objects

  • Returned proper error messages for invalid records (including failed indices)

  • Integrated existing middlewares (API key validation, usage limits, auth, etc.)


Why this is useful

  • Supports high-volume data operations (100+ records in one request)

  • Reduces network overhead and improves performance

  • Useful for migrations, batch uploads, and analytics use cases


How I tested it

  • Tested using Thunder Client (VS Code)
  • Verified endpoint response
  • Received 401 Unauthorized due to API key validation, confirming:
    ✔ route is correctly configured
    ✔ endpoint is reachable

Notes

  • Followed existing project structure and middleware flow
  • Ensured compatibility with current data validation system

Summary by CodeRabbit

  • New Features

    • Bulk-insert endpoint to submit multiple records (array or single object), with per-item validation, max-100 limit, and returns inserted count.
  • Behavior

    • Write operations now assign owner per item when applicable and enforce per-item ownership checks.
  • Bug Fixes

    • Requests with invalid items are rejected with clear 400 responses listing offending indices; empty-array payloads are rejected.
  • Refactor

    • Route formatting and data-listing control flow simplified for clarity.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f6d1e418-4aad-4495-adc9-5bba76004142

📥 Commits

Reviewing files that changed from the base of the PR and between e396d5f and e9df8ec.

📒 Files selected for processing (1)
  • apps/public-api/src/utils/AppError.js

📝 Walkthrough

Walkthrough

Adds a new POST /:collectionName/bulk endpoint and controller that validates a non-empty array (max 100 items), validates each item against the collection schema and rejects the whole request with 400 listing invalid indices if any fail; inserts validated records via insertMany and returns 201 with insertedCount. Also refactors getAllData branching, updates write authorization to per-item handling, and adds an AppError class.

Changes

Cohort / File(s) Summary
Bulk Insert Controller
apps/public-api/src/controllers/data.controller.js
Adds bulkInsertData(req,res,next): enforces body is non-empty array (<=100), resolves collection, validates each element against the collection schema, rejects whole request with 400 and invalid indices if any fail; uses insertMany(..., ordered: true) and returns 201 with insertedCount. Refactors getAllData count branch and formatting.
Routes (bulk route + formatting)
apps/public-api/src/routes/data.js
Adds POST /:collectionName/bulk route wired through existing middleware chain (API key, collection block, usage limit, auth context, authorizeWriteOperation) and imports bulkInsertData; existing routes reformatted to multi-line declarations.
Write Authorization Middleware
apps/public-api/src/middlewares/authorizeWriteOperation.js
For POST, normalizes req.body to an array, rejects empty arrays, enforces each element is a plain object, assigns authUserId into each item’s ownerField when missing/null/empty, and enforces per-item owner equality with index-specific 403 messages. PUT/PATCH/DELETE logic kept but error payloads reformatted.
AppError Utility
apps/public-api/src/utils/AppError.js
Adds AppError class extending Error with (message, statusCode), sets isOperational = true, captures stack trace, and exports the class.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Route as Route
    participant Middleware as Middleware
    participant Controller as Controller
    participant Database as Database

    Client->>Route: POST /:collectionName/bulk (array payload)
    Route->>Middleware: run API key, collection block, usage limit, auth context, authorizeWriteOperation
    Middleware->>Controller: forward normalized per-item body
    Controller->>Controller: assert array & size (<=100)
    Controller->>Controller: validate each item (collect invalid indices)
    alt any invalid
        Controller->>Client: 400 Bad Request (invalid indices list)
    else all valid
        Controller->>Database: insertMany(records, ordered: true)
        Database->>Controller: insertion result (insertedCount)
        Controller->>Client: 201 Created (insertedCount)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • Issue #101: Implements the same POST /:collectionName/bulk bulk endpoint with per-item validation and index-specific error reporting.

Possibly related PRs

  • PR #115: Modifies the same controller (apps/public-api/src/controllers/data.controller.js), specifically getAllData flow and regex maxTimeMS handling.

Suggested labels

enhancement, NSOC'26, level-2

Suggested reviewers

  • yash-pouranik

Poem

🐰 I hopped in with a careful stack,
Checked each record, front to back.
Owners set and indices scanned,
Inserted true — the rows all stand.
A joyful thump — bulk write, well planned.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a bulk insert endpoint to improve API performance.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
apps/public-api/src/routes/data.js (1)

22-22: Nit: the "MUST BE FIRST" comment is misleading.

POST /:collectionName and POST /:collectionName/bulk are distinct paths (Express path params don't span /), so ordering isn't required for correctness here. Consider dropping the emphasis to avoid future confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/public-api/src/routes/data.js` at line 22, Update the misleading comment
"✅ BULK INSERT ROUTE (MUST BE FIRST)" in the data routes so it no longer implies
ordering is required; change it to a neutral note like "✅ BULK INSERT ROUTE" or
"Bulk insert route (separate from POST /:collectionName)" near the POST
/:collectionName/bulk route to clarify that ordering is not necessary because
Express path params do not span '/'.
apps/public-api/src/controllers/data.controller.js (1)

176-184: Webhook dispatch loop scales poorly for large batches.

dispatchWebhooks does a Webhook.find({ projectId, enabled: true }) on every call and then fire-and-forgets an enqueue per webhook. Calling it once per inserted doc means N DB lookups and N×W enqueue operations for a single bulk request, racing on the event loop and (for large N) flooding the webhook queue.

Consider either:

  • Adding a dispatchWebhooksBulk({ action, documents }) that fetches matching webhooks once and enqueues a single payload per webhook (array of documents), or
  • At minimum, resolving webhooks once outside the loop and reusing the list, with bounded concurrency (e.g. p-limit) for the per-doc enqueues.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/public-api/src/controllers/data.controller.js` around lines 176 - 184,
The loop calls dispatchWebhooks for each doc causing N DB lookups and N×W
enqueues; fix by fetching matching webhooks once and enqueuing per-webhook
batches: call Webhook.find({ projectId: project._id, enabled: true }) once
(outside result.forEach) and either implement a new dispatchWebhooksBulk({
projectId: project._id, collection: collectionName, action: 'insert', documents:
result.map(d => d.toObject ? d.toObject() : d) }) that enqueues one payload per
webhook, or reuse the fetched webhook list and enqueue per-doc with bounded
concurrency (e.g., p-limit) inside the existing loop; update or add functions
dispatchWebhooksBulk and/or adjust dispatchWebhooks to accept an array of
documents and remove per-doc Webhook.find calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/public-api/src/controllers/data.controller.js`:
- Around line 167-205: The current insertMany call (Model.insertMany(..., {
ordered: false })) can throw a BulkWriteError while still persisting some docs,
but the catch treats it like a full 409 duplicate and skips
accounting/webhooks/response updates; either change to ordered: true to
guarantee all-or-nothing, or explicitly handle BulkWriteError: inspect
err.insertedDocs / err.result.insertedCount, increment Project.updateOne({ $inc:
{ databaseUsed: <size of insertedDocs> } }) for only those inserted docs, call
dispatchWebhooks for each doc in err.insertedDocs (use doc.toObject ?
doc.toObject() : doc), and return a partial-success response (e.g., status 207
or a 200 with inserted and failed lists and details) instead of a simple 409;
also keep existing isDuplicateKeyError handling for pure duplicate errors.
- Around line 107-113: The controller currently accepts incomingData arrays with
no size bounds and allows empty arrays, causing no-op inserts and potential
memory/BSON-limit/execution issues; add a guard at the top of the handler that
rejects empty arrays and enforces a MAX_BATCH_SIZE (e.g., 1000) before any heavy
work: check Array.isArray(incomingData), if incomingData.length === 0 return
res.status(400) with a clear error, if incomingData.length > MAX_BATCH_SIZE
return res.status(413) (or 400) with a "batch too large" error; perform this
check before running sanitize, JSON.stringify, schema validation, or
Model.insertMany so checkUsageLimits and downstream logic only run for
valid-sized batches (apply the same checks to the other handler block referenced
around the 124-147 region).
- Around line 110-147: The current handler returns raw { error: ... } objects
and exposes Mongo errors; update all branches in this controller (the validation
branch that checks incomingData array, the collection missing check using
collectionConfig, the invalidIndices response, success response after pushing
sanitized validData, and any 409/500 catch blocks) to use the AppError class for
failures and to always return the standard envelope { success: bool, data: {},
message: "" } (put insertedCount inside data, e.g. data: { insertedCount,
records: [...] } and include a user-friendly message). Replace any direct use of
err.message/details in responses with a logged server-side error (console or
logger) and return a sanitized generic message to the client; continue using
validateData and sanitize for input processing but do not leak Mongo error text
to responses. Ensure 404/400 use AppError and the same envelope shape.

In `@apps/public-api/src/routes/data.js`:
- Around line 23-31: authorizeWriteOperation currently assumes req.body is a
single object; update it to detect Array.isArray(req.body) and handle arrays by
iterating over each record: for each element, read ownerField, if
missing/null/empty set it to authUserId on the record (not on the array), and if
present ensure String(record[ownerField]) === authUserId else reject the
request; additionally, for array mode validate/normalize/_reject provided _id on
each element the same way you do for single-object mode (e.g., prevent clients
from setting conflicting IDs), and preserve existing single-object behavior when
req.body is not an array. Ensure the middleware still calls next() only if all
records pass these per-record checks.

---

Nitpick comments:
In `@apps/public-api/src/controllers/data.controller.js`:
- Around line 176-184: The loop calls dispatchWebhooks for each doc causing N DB
lookups and N×W enqueues; fix by fetching matching webhooks once and enqueuing
per-webhook batches: call Webhook.find({ projectId: project._id, enabled: true
}) once (outside result.forEach) and either implement a new
dispatchWebhooksBulk({ projectId: project._id, collection: collectionName,
action: 'insert', documents: result.map(d => d.toObject ? d.toObject() : d) })
that enqueues one payload per webhook, or reuse the fetched webhook list and
enqueue per-doc with bounded concurrency (e.g., p-limit) inside the existing
loop; update or add functions dispatchWebhooksBulk and/or adjust
dispatchWebhooks to accept an array of documents and remove per-doc Webhook.find
calls.

In `@apps/public-api/src/routes/data.js`:
- Line 22: Update the misleading comment "✅ BULK INSERT ROUTE (MUST BE FIRST)"
in the data routes so it no longer implies ordering is required; change it to a
neutral note like "✅ BULK INSERT ROUTE" or "Bulk insert route (separate from
POST /:collectionName)" near the POST /:collectionName/bulk route to clarify
that ordering is not necessary because Express path params do not span '/'.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 86205242-8fa5-4ba5-879e-ae4a540092b7

📥 Commits

Reviewing files that changed from the base of the PR and between 4c26d34 and ac84e83.

📒 Files selected for processing (2)
  • apps/public-api/src/controllers/data.controller.js
  • apps/public-api/src/routes/data.js

Comment thread apps/public-api/src/controllers/data.controller.js
Comment thread apps/public-api/src/controllers/data.controller.js Outdated
Comment on lines +167 to +205
const result = await Model.insertMany(validData, { ordered: false });

if (!project.resources.db.isExternal) {
await Project.updateOne(
{ _id: project._id },
{ $inc: { databaseUsed: totalSize } },
);
}

result.forEach((doc) => {
dispatchWebhooks({
projectId: project._id,
collection: collectionName,
action: 'insert',
document: doc.toObject ? doc.toObject() : doc,
documentId: doc._id,
});
});

if (isDebug) console.log(`[DEBUG] bulk insert took ${(performance.now() - start).toFixed(2)}ms`);

return res.status(201).json({
success: true,
insertedCount: result.length,
data: result,
});
} catch (err) {
console.error(err);

if (isDuplicateKeyError(err)) {
return res.status(409).json({
error: "Duplicate value violates unique constraint.",
details: err.message,
});
}

res.status(500).json({ error: err.message });
}
};
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: ordered: false partial-success path silently loses state.

With ordered: false, insertMany throws a BulkWriteError on the first constraint violation while still inserting the remaining valid docs (they live on err.insertedDocs / err.result.insertedCount). The current catch maps that to a 409 Duplicate as if nothing was written, so for those successfully-persisted docs:

  • Project.databaseUsed is never incremented, so usage accounting drifts below reality.
  • dispatchWebhooks is never called, so downstream consumers miss insert events for real documents in the DB.
  • The client gets a response implying no write happened, even though some records are live.

Either switch to ordered: true (so nothing is inserted on any failure), or keep ordered: false and handle BulkWriteError explicitly: compute insertedDocs, increment databaseUsed by their real byte size, dispatch webhooks for each, and return a 207-style partial-success payload that lists failed indices/reasons.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/public-api/src/controllers/data.controller.js` around lines 167 - 205,
The current insertMany call (Model.insertMany(..., { ordered: false })) can
throw a BulkWriteError while still persisting some docs, but the catch treats it
like a full 409 duplicate and skips accounting/webhooks/response updates; either
change to ordered: true to guarantee all-or-nothing, or explicitly handle
BulkWriteError: inspect err.insertedDocs / err.result.insertedCount, increment
Project.updateOne({ $inc: { databaseUsed: <size of insertedDocs> } }) for only
those inserted docs, call dispatchWebhooks for each doc in err.insertedDocs (use
doc.toObject ? doc.toObject() : doc), and return a partial-success response
(e.g., status 207 or a 200 with inserted and failed lists and details) instead
of a simple 409; also keep existing isDuplicateKeyError handling for pure
duplicate errors.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this resolved @coderabbitai??

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yash-pouranik, no, this issue is not yet resolved in the current code. The bulkInsertData function still uses { ordered: false } and the catch block still only checks isDuplicateKeyError without any special handling for BulkWriteError.

Concretely, these three problems remain:

  1. databaseUsed drift — partially-inserted docs never increment Project.databaseUsed.
  2. Missed webhooksdispatchWebhooks is never called for docs that were actually persisted.
  3. Misleading client response — the client receives a 409 implying nothing was written, even though some records are live in the DB.

To fix this, the catch block needs to explicitly handle BulkWriteError. Here's a concrete suggestion:

const result = await Model.insertMany(validData, { ordered: false });
// ... (success path unchanged)
} catch (err) {
  console.error(err);

  // BulkWriteError: some docs may have been inserted despite the error
  if (err.name === 'MongoBulkWriteError' || err.name === 'BulkWriteError') {
    const insertedDocs = err.insertedDocs ?? [];

    if (!project.resources.db.isExternal && insertedDocs.length > 0) {
      const insertedSize = Buffer.byteLength(JSON.stringify(insertedDocs));
      await Project.updateOne(
        { _id: project._id },
        { $inc: { databaseUsed: insertedSize } },
      );
    }

    insertedDocs.forEach((doc) => {
      dispatchWebhooks({
        projectId: project._id,
        collection: collectionName,
        action: 'insert',
        document: doc.toObject ? doc.toObject() : doc,
        documentId: doc._id,
      });
    });

    return res.status(207).json({
      success: false,
      insertedCount: insertedDocs.length,
      data: insertedDocs,
      errors: err.writeErrors?.map((e) => ({
        index: e.index,
        message: e.errmsg,
      })) ?? [],
    });
  }

  if (isDuplicateKeyError(err)) {
    return res.status(409).json({
      error: "Duplicate value violates unique constraint.",
      details: err.message,
    });
  }

  res.status(500).json({ error: err.message });
}

Alternatively, the simpler fix is to switch to ordered: true (all-or-nothing semantics), which avoids partial-success entirely and keeps the current catch logic correct.


🧠 Learnings used
Learnt from: CR
Repo: geturbackend/urBackend PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-21T21:00:02.427Z
Learning: When changing collection behavior, check: schema validation in `packages/common`, model compilation in `packages/common/src/utils/injectModel.js`, dashboard create collection UI, and dashboard-api collection creation/index creation flow.

Comment on lines +23 to +31
router.post(
'/:collectionName/bulk',
verifyApiKey,
blockUsersCollectionDataAccess,
checkUsageLimits,
resolvePublicAuthContext,
authorizeWriteOperation,
bulkInsertData
);
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: authorizeWriteOperation is not array-aware — RLS is fully bypassed for bulk inserts.

The bulk route reuses authorizeWriteOperation, but that middleware treats req.body as a single object:

const incomingOwner = req.body?.[ownerField];
if (incomingOwner === undefined || incomingOwner === null || incomingOwner === '') {
    req.body[ownerField] = authUserId;   // on an Array this sets a property on the array, not on records
    return next();
}
if (String(incomingOwner) !== authUserId) { ... }  // never runs for arrays

With req.body being an array, incomingOwner is always undefined, so:

  • The owner-mismatch check is skipped (a caller can embed any ownerField value in each record and it will pass through).
  • Auto-assigning authUserId attaches it to the array object, not to each record, so the records are still missing the owner field (or retain whatever the caller sent).

Net effect: authenticated users can write records owned by other users via /bulk, a complete RLS bypass. Please update authorizeWriteOperation to branch on Array.isArray(req.body) and apply the owner check/auto-assignment to each element (and reject/normalize _id for each), before this route is shipped.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/public-api/src/routes/data.js` around lines 23 - 31,
authorizeWriteOperation currently assumes req.body is a single object; update it
to detect Array.isArray(req.body) and handle arrays by iterating over each
record: for each element, read ownerField, if missing/null/empty set it to
authUserId on the record (not on the array), and if present ensure
String(record[ownerField]) === authUserId else reject the request; additionally,
for array mode validate/normalize/_reject provided _id on each element the same
way you do for single-object mode (e.g., prevent clients from setting
conflicting IDs), and preserve existing single-object behavior when req.body is
not an array. Ensure the middleware still calls next() only if all records pass
these per-record checks.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Priyanka0205-CSE
This is not resolved yet.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@yash-pouranik
Copy link
Copy Markdown
Collaborator

@Priyanka0205-CSE
Thank you for the PR - Please address the comments added by coderabbitai

@Priyanka0205-CSE
Copy link
Copy Markdown
Author

@yash-pouranik
Thanks for the detailed review! I’ll address all the comments and update the PR shortly.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/public-api/src/middlewares/authorizeWriteOperation.js (1)

60-83: Include the offending item index in array-mode error responses.

For array bodies, the loop fails fast on the first bad item or owner mismatch without telling the caller which index failed. The PR objective for /bulk explicitly aims to "return error messages for invalid records (including failed indices)", and the controller already does per-index reporting — surfacing the index here keeps middleware errors consistent and debuggable for bulk callers.

Also, since callers can now legitimately send an array, consider whether an empty array should be rejected here or deferred to the controller (currently it silently passes through to next()).

♻️ Suggested tweak
-            const bodyItems = Array.isArray(req.body) ? req.body : [req.body];
+            const isBulk = Array.isArray(req.body);
+            const bodyItems = isBulk ? req.body : [req.body];
 
-            for (const item of bodyItems) {
+            for (let i = 0; i < bodyItems.length; i++) {
+                const item = bodyItems[i];
                 if (!item || typeof item !== 'object' || Array.isArray(item)) {
                     return res.status(400).json({
                         error: 'Invalid request body',
-                        message: 'Request body must be an object or an array of objects.'
+                        message: 'Request body must be an object or an array of objects.',
+                        ...(isBulk ? { index: i } : {})
                     });
                 }
 
                 const incomingOwner = item?.[ownerField];
 
                 if (incomingOwner === undefined || incomingOwner === null || incomingOwner === '') {
                     item[ownerField] = authUserId;
                     continue;
                 }
 
                 if (String(incomingOwner) !== authUserId) {
                     return res.status(403).json({
                         error: 'RLS owner mismatch',
-                        message: `You can only create documents with ${ownerField} equal to your user id.`
+                        message: `You can only create documents with ${ownerField} equal to your user id.`,
+                        ...(isBulk ? { index: i } : {})
                     });
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/public-api/src/middlewares/authorizeWriteOperation.js` around lines 60 -
83, The middleware authorizeWriteOperation iterates bodyItems but returns
generic errors without the offending array index and also silently allows an
empty array; update the loop and error responses so that when req.body is an
array you include the index (i) in both the 400 "Invalid request body" response
and the 403 "RLS owner mismatch" response (use bodyItems and
ownerField/authUserId to build the message), and add explicit handling for an
empty array (either reject with 400 and a clear message including that it’s
empty, or defer to the controller by calling next()—choose and implement the
desired behavior consistently).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/public-api/src/middlewares/authorizeWriteOperation.js`:
- Around line 60-83: The middleware authorizeWriteOperation iterates bodyItems
but returns generic errors without the offending array index and also silently
allows an empty array; update the loop and error responses so that when req.body
is an array you include the index (i) in both the 400 "Invalid request body"
response and the 403 "RLS owner mismatch" response (use bodyItems and
ownerField/authUserId to build the message), and add explicit handling for an
empty array (either reject with 400 and a clear message including that it’s
empty, or defer to the controller by calling next()—choose and implement the
desired behavior consistently).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a7910006-7abb-447b-8b2b-5987b5b359ae

📥 Commits

Reviewing files that changed from the base of the PR and between ac84e83 and eb34f14.

📒 Files selected for processing (1)
  • apps/public-api/src/middlewares/authorizeWriteOperation.js

@Priyanka0205-CSE
Copy link
Copy Markdown
Author

@yash-pouranik
I’ve addressed the nitpick comment by improving the error handling:
Added validation for empty array inputs
Included index-based error messages for better clarity and debugging

Please take a look when you have time. Thank you for the feedback!

@Priyanka0205-CSE
Copy link
Copy Markdown
Author

@yash-pouranik
I’ve addressed all the review comments:
Added batch validation (empty array and max limit)
Updated response format to match project standards
Removed exposure of raw MongoDB errors
Fixed partial insert issue using ordered: true
Fixed security issue by updating authorizeWriteOperation to handle array inputs

Please review again. Thank you!

@yash-pouranik
Copy link
Copy Markdown
Collaborator

@Priyanka0205-CSE
Hey I think u havent commited last changes?? please check. Meanwhile I am reviewing ur last commit.

Comment thread apps/public-api/src/controllers/data.controller.js
Comment thread apps/public-api/src/controllers/data.controller.js Outdated
Comment on lines +167 to +205
const result = await Model.insertMany(validData, { ordered: false });

if (!project.resources.db.isExternal) {
await Project.updateOne(
{ _id: project._id },
{ $inc: { databaseUsed: totalSize } },
);
}

result.forEach((doc) => {
dispatchWebhooks({
projectId: project._id,
collection: collectionName,
action: 'insert',
document: doc.toObject ? doc.toObject() : doc,
documentId: doc._id,
});
});

if (isDebug) console.log(`[DEBUG] bulk insert took ${(performance.now() - start).toFixed(2)}ms`);

return res.status(201).json({
success: true,
insertedCount: result.length,
data: result,
});
} catch (err) {
console.error(err);

if (isDuplicateKeyError(err)) {
return res.status(409).json({
error: "Duplicate value violates unique constraint.",
details: err.message,
});
}

res.status(500).json({ error: err.message });
}
};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this resolved @coderabbitai??

Comment on lines 52 to -54
if (method === 'POST') {
const incomingOwner = req.body?.[ownerField];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did u removed this - const incomingOwner = req.body?.[ownerField];
???

@Priyanka0205-CSE

Comment on lines +23 to +31
router.post(
'/:collectionName/bulk',
verifyApiKey,
blockUsersCollectionDataAccess,
checkUsageLimits,
resolvePublicAuthContext,
authorizeWriteOperation,
bulkInsertData
);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Priyanka0205-CSE
This is not resolved yet.

@Priyanka0205-CSE
Copy link
Copy Markdown
Author

@yash-pouranik
Hi @yash-pouranik,

I removed that line because with the bulk insert endpoint, req.body can now be an array of objects instead of a single object.

So instead of reading ownerField from req.body, I now iterate over bodyItems and validate incomingOwner for each item individually.

Let me know if you’d prefer a different approach.

@yash-pouranik
Copy link
Copy Markdown
Collaborator

@Priyanka0205-CSE ya thats okay. Please commit, I will check.

@yash-pouranik
Copy link
Copy Markdown
Collaborator

I dont know on what what comments u have worked on. But in next commit please fix everything and betterto use the suggestions added.

@Priyanka0205-CSE
Copy link
Copy Markdown
Author

@yash-pouranik

Thanks for the feedback!

I’ve now addressed all the review comments and improved the validation logic and error messages for clarity and consistency.

Please review again. Let me know if anything else needs improvement.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
apps/public-api/src/controllers/data.controller.js (1)

137-156: Optional: short-circuit validation when an invalid item is found.

Since the contract is "any invalid item rejects the whole batch", the loop can return on the first failure instead of running validateData + sanitize for every remaining record (and uselessly populating validData, which is then discarded). For 100 items where the first is malformed, you currently still pay 99 schema validations. Switching to a for...of (or plain for) with an early return next(new AppError(...)) is both faster and clearer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/public-api/src/controllers/data.controller.js` around lines 137 - 156,
The loop currently uses incomingData.forEach and continues validating/sanitizing
after an invalid item is found, populating validData and invalidIndices
unnecessarily; change the iteration to a for...of (or plain for) over
incomingData, validate each item with validateData(item, schemaRules), and as
soon as an invalid item is detected call return next(new AppError(`Invalid
records at index: ${index}`, 400)) to short-circuit and avoid further calls to
validateData and sanitize, removing the accumulated invalidIndices/validData
logic since the contract rejects the whole batch on first failure.
apps/public-api/src/middlewares/authorizeWriteOperation.js (1)

60-92: Inconsistent indentation in the new POST block.

The wrapping logic itself is correct (the [req.body] wrapper preserves the reference, so mutations like item[ownerField] = authUserId still propagate to a single-object body), but lines 62‑92 mix 0/4/8/10‑space indents inside a block whose surrounding code uses 12/16. Please re-indent so the new block lines up with line 60 and the existing if (ownerField === '_id') block above.

♻️ Suggested re-indent
             const bodyItems = Array.isArray(req.body) ? req.body : [req.body];

-          if (bodyItems.length === 0) {
-    return res.status(400).json({
-        error: 'Invalid request body',
-        message: 'Request body cannot be an empty array.'
-    });
-}
-
-for (let i = 0; i < bodyItems.length; i++) {
-    const item = bodyItems[i];
-
-    if (!item || typeof item !== 'object' || Array.isArray(item)) {
-        return res.status(400).json({
-            error: 'Invalid request body',
-           message: `Item at index ${i} must be a valid object`
-        });
-    }
-
-    const incomingOwner = item?.[ownerField];
-
-    if (incomingOwner === undefined || incomingOwner === null || incomingOwner === '') {
-        item[ownerField] = authUserId;
-        continue;
-    }
-
-    if (String(incomingOwner) !== authUserId) {
-        return res.status(403).json({
-            error: 'RLS owner mismatch',
-            message: `Item at index ${i} must have ${ownerField} equal to your user id`
-        });
-    }
-}
+            if (bodyItems.length === 0) {
+                return res.status(400).json({
+                    error: 'Invalid request body',
+                    message: 'Request body cannot be an empty array.'
+                });
+            }
+
+            for (let i = 0; i < bodyItems.length; i++) {
+                const item = bodyItems[i];
+
+                if (!item || typeof item !== 'object' || Array.isArray(item)) {
+                    return res.status(400).json({
+                        error: 'Invalid request body',
+                        message: `Item at index ${i} must be a valid object`
+                    });
+                }
+
+                const incomingOwner = item?.[ownerField];
+
+                if (incomingOwner === undefined || incomingOwner === null || incomingOwner === '') {
+                    item[ownerField] = authUserId;
+                    continue;
+                }
+
+                if (String(incomingOwner) !== authUserId) {
+                    return res.status(403).json({
+                        error: 'RLS owner mismatch',
+                        message: `Item at index ${i} must have ${ownerField} equal to your user id`
+                    });
+                }
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/public-api/src/middlewares/authorizeWriteOperation.js` around lines 60 -
92, The new POST body-handling block inside authorizeWriteOperation is
mis-indented versus the surrounding code; reformat the block that declares
bodyItems and the subsequent for-loop/validation so its indentation aligns with
the existing block that starts at the if (ownerField === '_id') statement (match
the 12/16-space nesting used there). Keep the same logic and variable names
(bodyItems, item, incomingOwner, ownerField, authUserId) but adjust each line's
leading spaces so the entire block shares the same indentation level as the
surrounding code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/public-api/src/controllers/data.controller.js`:
- Around line 175-178: The catch block in the bulk-insert handler currently
always returns a 500; change it to detect duplicate-key errors (use
isDuplicateKeyError(err)) and route those to a 409 using AppError (do not expose
err.message), mirroring the pattern used by insertData and updateSingleData: if
isDuplicateKeyError(err) return next(new AppError("Duplicate key error", 409));
otherwise fall through to the existing next(new AppError("Failed to insert bulk
data", 500)); ensure you reference the same next and AppError usage as in this
controller.
- Around line 100-179: bulkInsertData currently skips quota checks, usage
accounting, and webhook dispatch present in insertData; fix by adding the same
pre-flight quota validation (check project.databaseUsed + totalSize >
project.databaseLimit) before calling Model.insertMany, incrementing
Project.databaseUsed by the total inserted document size and persisting the
project after successful insertMany, and dispatching dispatchWebhooks for each
inserted record (use action: 'insert' and include project/_id/collection info)
after insertion; locate and mirror the relevant logic from insertData (the
blocks around Model.create, Project.databaseUsed updates, and dispatchWebhooks)
and adapt them to work with Model.insertMany and the array of validData returned
by insertMany.

---

Nitpick comments:
In `@apps/public-api/src/controllers/data.controller.js`:
- Around line 137-156: The loop currently uses incomingData.forEach and
continues validating/sanitizing after an invalid item is found, populating
validData and invalidIndices unnecessarily; change the iteration to a for...of
(or plain for) over incomingData, validate each item with validateData(item,
schemaRules), and as soon as an invalid item is detected call return next(new
AppError(`Invalid records at index: ${index}`, 400)) to short-circuit and avoid
further calls to validateData and sanitize, removing the accumulated
invalidIndices/validData logic since the contract rejects the whole batch on
first failure.

In `@apps/public-api/src/middlewares/authorizeWriteOperation.js`:
- Around line 60-92: The new POST body-handling block inside
authorizeWriteOperation is mis-indented versus the surrounding code; reformat
the block that declares bodyItems and the subsequent for-loop/validation so its
indentation aligns with the existing block that starts at the if (ownerField ===
'_id') statement (match the 12/16-space nesting used there). Keep the same logic
and variable names (bodyItems, item, incomingOwner, ownerField, authUserId) but
adjust each line's leading spaces so the entire block shares the same
indentation level as the surrounding code.
🪄 Autofix (Beta)

❌ Autofix failed (check again to retry)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96857de1-258b-425a-9531-10ed713a603f

📥 Commits

Reviewing files that changed from the base of the PR and between eb34f14 and e396d5f.

📒 Files selected for processing (2)
  • apps/public-api/src/controllers/data.controller.js
  • apps/public-api/src/middlewares/authorizeWriteOperation.js

Comment on lines +100 to +179
// BULK INSERT DATA

module.exports.bulkInsertData = async (req, res, next) => {
try {
const MAX_BULK_INSERT_LIMIT = 100;

const { collectionName } = req.params;
const project = req.project;
const incomingData = req.body;

if (!Array.isArray(incomingData)) {
return next(new AppError("Request body must be an array of objects", 400));
}

if (incomingData.length === 0) {
return next(new AppError("Request body cannot be empty", 400));
}

if (incomingData.length > MAX_BULK_INSERT_LIMIT) {
return next(
new AppError(`Maximum ${MAX_BULK_INSERT_LIMIT} records allowed`, 400)
);
}

const collectionConfig = project.collections.find(
(c) => c.name === collectionName
);

if (!collectionConfig) {
return next(new AppError("Collection not found", 404));
}

const schemaRules = collectionConfig.model;

const validData = [];
const invalidIndices = [];

incomingData.forEach((item, index) => {
if (!item || typeof item !== "object" || Array.isArray(item)) {
invalidIndices.push(index);
return;
}

const { error, cleanData } = validateData(item, schemaRules);

if (error) {
invalidIndices.push(index);
} else {
validData.push(sanitize(cleanData));
}
});

if (invalidIndices.length > 0) {
return next(
new AppError(`Invalid records at index: ${invalidIndices.join(", ")}`, 400)
);
}

const connection = await getConnection(project._id);
const Model = getCompiledModel(
connection,
collectionConfig,
project._id,
project.resources.db.isExternal
);

const result = await Model.insertMany(validData, { ordered: true });

return res.status(201).json({
success: true,
data: {
insertedCount: result.length,
},
message: "Bulk insert successful",
});
} catch (err) {
console.error(err);
return next(new AppError("Failed to insert bulk data", 500));
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: bulk handler skips quota enforcement, usage accounting, and webhook dispatch.

bulkInsertData doesn't replicate the three side effects that insertData performs around Model.create (lines 51‑57, 69‑74, 76‑82). Compared to a 100x single-insert flow, this bulk path silently:

  1. Bypasses the database limit — there is no pre-flight project.databaseUsed + size > project.databaseLimit check, so a caller can drive any project past its plan quota in one request.
  2. Loses usage accountingProject.databaseUsed is never incremented after insertMany, so subsequent quota checks underestimate real storage usage indefinitely (drift is permanent until reseeded).
  3. Skips webhooksdispatchWebhooks({ action: 'insert', ... }) is never called for any of the inserted docs, so downstream consumers miss insert events for real records that exist in the DB. This silently breaks integrations (zaps, audit pipelines, sync mirrors, etc.).

All three are reachable on the happy path. Please mirror the insertData handler:

🛡️ Suggested fix (parity with insertData)
     if (invalidIndices.length > 0) {
       return next(
         new AppError(`Invalid records at index: ${invalidIndices.join(", ")}`, 400)
       );
     }

+    let totalSize = 0;
+    if (!project.resources.db.isExternal) {
+      totalSize = Buffer.byteLength(JSON.stringify(validData));
+      if ((project.databaseUsed || 0) + totalSize > project.databaseLimit) {
+        return next(new AppError("Database limit exceeded.", 403));
+      }
+    }
+
     const connection = await getConnection(project._id);
     const Model = getCompiledModel(
       connection,
       collectionConfig,
       project._id,
       project.resources.db.isExternal
     );

     const result = await Model.insertMany(validData, { ordered: true });

+    if (!project.resources.db.isExternal) {
+      await Project.updateOne(
+        { _id: project._id },
+        { $inc: { databaseUsed: totalSize } },
+      );
+    }
+
+    result.forEach((doc) => {
+      dispatchWebhooks({
+        projectId: project._id,
+        collection: collectionName,
+        action: 'insert',
+        document: doc.toObject ? doc.toObject() : doc,
+        documentId: doc._id,
+      });
+    });
+
     return res.status(201).json({
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/public-api/src/controllers/data.controller.js` around lines 100 - 179,
bulkInsertData currently skips quota checks, usage accounting, and webhook
dispatch present in insertData; fix by adding the same pre-flight quota
validation (check project.databaseUsed + totalSize > project.databaseLimit)
before calling Model.insertMany, incrementing Project.databaseUsed by the total
inserted document size and persisting the project after successful insertMany,
and dispatching dispatchWebhooks for each inserted record (use action: 'insert'
and include project/_id/collection info) after insertion; locate and mirror the
relevant logic from insertData (the blocks around Model.create,
Project.databaseUsed updates, and dispatchWebhooks) and adapt them to work with
Model.insertMany and the array of validData returned by insertMany.

Comment on lines +175 to +178
} catch (err) {
console.error(err);
return next(new AppError("Failed to insert bulk data", 500));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Catch block masks duplicate-key errors as 500.

Every other write handler in this file (insertData line 89, updateSingleData line 460) routes isDuplicateKeyError(err) to a 409. Here the catch unconditionally returns a generic 500 "Failed to insert bulk data", so a single unique-constraint violation in the batch becomes indistinguishable from an actual server error — clients can't disambiguate "fix your payload" from "retry later", and ordered:true means the failing index is recoverable info you're discarding.

Please branch on isDuplicateKeyError(err) (without exposing err.message, per the controller guideline) before falling through to the generic 500.

🛡️ Suggested fix
   } catch (err) {
     console.error(err);
+    if (isDuplicateKeyError(err)) {
+      return next(new AppError("Duplicate value violates unique constraint.", 409));
+    }
     return next(new AppError("Failed to insert bulk data", 500));
   }

As per coding guidelines: "All API endpoints return: { success: bool, data: {}, message: "" }. Use AppError class for errors — never raw throw, never expose MongoDB errors to client."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/public-api/src/controllers/data.controller.js` around lines 175 - 178,
The catch block in the bulk-insert handler currently always returns a 500;
change it to detect duplicate-key errors (use isDuplicateKeyError(err)) and
route those to a 409 using AppError (do not expose err.message), mirroring the
pattern used by insertData and updateSingleData: if isDuplicateKeyError(err)
return next(new AppError("Duplicate key error", 409)); otherwise fall through to
the existing next(new AppError("Failed to insert bulk data", 500)); ensure you
reference the same next and AppError usage as in this controller.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 26, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

⚠️ Branch updated during autofix.

The branch was updated while autofix was in progress. Please try again.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 26, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

⚠️ Branch updated during autofix.

The branch was updated while autofix was in progress. Please try again.

@yash-pouranik
Copy link
Copy Markdown
Collaborator

@Priyanka0205-CSE
2 more things to update

  • comments from coderabbit
  • ci checks are failing (dashboard-api tests, public api tests)

also please update ur branch with urbackend's Main.

@yash-pouranik
Copy link
Copy Markdown
Collaborator

@Priyanka0205-CSE coderabbit comments are still not resolved

Copy link
Copy Markdown
Author

@Priyanka0205-CSE Priyanka0205-CSE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I’ll go through the coderabbit comments and update my branch with the latest main.

@yash-pouranik
Copy link
Copy Markdown
Collaborator

@Priyanka0205-CSE
I am referring to this

#134 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Backend] Bulk Operations Endpoint **Difficulty**: Intermediate | **Area**: apps/public-api

2 participants