[NSoC'26] Add Bulk Insert Endpoint to Improve Data API Performance#134
[NSoC'26] Add Bulk Insert Endpoint to Improve Data API Performance#134Priyanka0205-CSE wants to merge 4 commits intogeturbackend:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new POST /:collectionName/bulk endpoint and controller that validates a non-empty array (max 100 items), validates each item against the collection schema and rejects the whole request with 400 listing invalid indices if any fail; inserts validated records via insertMany and returns 201 with insertedCount. Also refactors getAllData branching, updates write authorization to per-item handling, and adds an AppError class. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Route as Route
participant Middleware as Middleware
participant Controller as Controller
participant Database as Database
Client->>Route: POST /:collectionName/bulk (array payload)
Route->>Middleware: run API key, collection block, usage limit, auth context, authorizeWriteOperation
Middleware->>Controller: forward normalized per-item body
Controller->>Controller: assert array & size (<=100)
Controller->>Controller: validate each item (collect invalid indices)
alt any invalid
Controller->>Client: 400 Bad Request (invalid indices list)
else all valid
Controller->>Database: insertMany(records, ordered: true)
Database->>Controller: insertion result (insertedCount)
Controller->>Client: 201 Created (insertedCount)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
apps/public-api/src/routes/data.js (1)
22-22: Nit: the "MUST BE FIRST" comment is misleading.
POST /:collectionNameandPOST /:collectionName/bulkare distinct paths (Express path params don't span/), so ordering isn't required for correctness here. Consider dropping the emphasis to avoid future confusion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/public-api/src/routes/data.js` at line 22, Update the misleading comment "✅ BULK INSERT ROUTE (MUST BE FIRST)" in the data routes so it no longer implies ordering is required; change it to a neutral note like "✅ BULK INSERT ROUTE" or "Bulk insert route (separate from POST /:collectionName)" near the POST /:collectionName/bulk route to clarify that ordering is not necessary because Express path params do not span '/'.apps/public-api/src/controllers/data.controller.js (1)
176-184: Webhook dispatch loop scales poorly for large batches.
dispatchWebhooksdoes aWebhook.find({ projectId, enabled: true })on every call and then fire-and-forgets an enqueue per webhook. Calling it once per inserted doc means N DB lookups and N×W enqueue operations for a single bulk request, racing on the event loop and (for large N) flooding the webhook queue.Consider either:
- Adding a
dispatchWebhooksBulk({ action, documents })that fetches matching webhooks once and enqueues a single payload per webhook (array of documents), or- At minimum, resolving webhooks once outside the loop and reusing the list, with bounded concurrency (e.g.
p-limit) for the per-doc enqueues.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/public-api/src/controllers/data.controller.js` around lines 176 - 184, The loop calls dispatchWebhooks for each doc causing N DB lookups and N×W enqueues; fix by fetching matching webhooks once and enqueuing per-webhook batches: call Webhook.find({ projectId: project._id, enabled: true }) once (outside result.forEach) and either implement a new dispatchWebhooksBulk({ projectId: project._id, collection: collectionName, action: 'insert', documents: result.map(d => d.toObject ? d.toObject() : d) }) that enqueues one payload per webhook, or reuse the fetched webhook list and enqueue per-doc with bounded concurrency (e.g., p-limit) inside the existing loop; update or add functions dispatchWebhooksBulk and/or adjust dispatchWebhooks to accept an array of documents and remove per-doc Webhook.find calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/public-api/src/controllers/data.controller.js`:
- Around line 167-205: The current insertMany call (Model.insertMany(..., {
ordered: false })) can throw a BulkWriteError while still persisting some docs,
but the catch treats it like a full 409 duplicate and skips
accounting/webhooks/response updates; either change to ordered: true to
guarantee all-or-nothing, or explicitly handle BulkWriteError: inspect
err.insertedDocs / err.result.insertedCount, increment Project.updateOne({ $inc:
{ databaseUsed: <size of insertedDocs> } }) for only those inserted docs, call
dispatchWebhooks for each doc in err.insertedDocs (use doc.toObject ?
doc.toObject() : doc), and return a partial-success response (e.g., status 207
or a 200 with inserted and failed lists and details) instead of a simple 409;
also keep existing isDuplicateKeyError handling for pure duplicate errors.
- Around line 107-113: The controller currently accepts incomingData arrays with
no size bounds and allows empty arrays, causing no-op inserts and potential
memory/BSON-limit/execution issues; add a guard at the top of the handler that
rejects empty arrays and enforces a MAX_BATCH_SIZE (e.g., 1000) before any heavy
work: check Array.isArray(incomingData), if incomingData.length === 0 return
res.status(400) with a clear error, if incomingData.length > MAX_BATCH_SIZE
return res.status(413) (or 400) with a "batch too large" error; perform this
check before running sanitize, JSON.stringify, schema validation, or
Model.insertMany so checkUsageLimits and downstream logic only run for
valid-sized batches (apply the same checks to the other handler block referenced
around the 124-147 region).
- Around line 110-147: The current handler returns raw { error: ... } objects
and exposes Mongo errors; update all branches in this controller (the validation
branch that checks incomingData array, the collection missing check using
collectionConfig, the invalidIndices response, success response after pushing
sanitized validData, and any 409/500 catch blocks) to use the AppError class for
failures and to always return the standard envelope { success: bool, data: {},
message: "" } (put insertedCount inside data, e.g. data: { insertedCount,
records: [...] } and include a user-friendly message). Replace any direct use of
err.message/details in responses with a logged server-side error (console or
logger) and return a sanitized generic message to the client; continue using
validateData and sanitize for input processing but do not leak Mongo error text
to responses. Ensure 404/400 use AppError and the same envelope shape.
In `@apps/public-api/src/routes/data.js`:
- Around line 23-31: authorizeWriteOperation currently assumes req.body is a
single object; update it to detect Array.isArray(req.body) and handle arrays by
iterating over each record: for each element, read ownerField, if
missing/null/empty set it to authUserId on the record (not on the array), and if
present ensure String(record[ownerField]) === authUserId else reject the
request; additionally, for array mode validate/normalize/_reject provided _id on
each element the same way you do for single-object mode (e.g., prevent clients
from setting conflicting IDs), and preserve existing single-object behavior when
req.body is not an array. Ensure the middleware still calls next() only if all
records pass these per-record checks.
---
Nitpick comments:
In `@apps/public-api/src/controllers/data.controller.js`:
- Around line 176-184: The loop calls dispatchWebhooks for each doc causing N DB
lookups and N×W enqueues; fix by fetching matching webhooks once and enqueuing
per-webhook batches: call Webhook.find({ projectId: project._id, enabled: true
}) once (outside result.forEach) and either implement a new
dispatchWebhooksBulk({ projectId: project._id, collection: collectionName,
action: 'insert', documents: result.map(d => d.toObject ? d.toObject() : d) })
that enqueues one payload per webhook, or reuse the fetched webhook list and
enqueue per-doc with bounded concurrency (e.g., p-limit) inside the existing
loop; update or add functions dispatchWebhooksBulk and/or adjust
dispatchWebhooks to accept an array of documents and remove per-doc Webhook.find
calls.
In `@apps/public-api/src/routes/data.js`:
- Line 22: Update the misleading comment "✅ BULK INSERT ROUTE (MUST BE FIRST)"
in the data routes so it no longer implies ordering is required; change it to a
neutral note like "✅ BULK INSERT ROUTE" or "Bulk insert route (separate from
POST /:collectionName)" near the POST /:collectionName/bulk route to clarify
that ordering is not necessary because Express path params do not span '/'.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 86205242-8fa5-4ba5-879e-ae4a540092b7
📒 Files selected for processing (2)
apps/public-api/src/controllers/data.controller.jsapps/public-api/src/routes/data.js
| const result = await Model.insertMany(validData, { ordered: false }); | ||
|
|
||
| if (!project.resources.db.isExternal) { | ||
| await Project.updateOne( | ||
| { _id: project._id }, | ||
| { $inc: { databaseUsed: totalSize } }, | ||
| ); | ||
| } | ||
|
|
||
| result.forEach((doc) => { | ||
| dispatchWebhooks({ | ||
| projectId: project._id, | ||
| collection: collectionName, | ||
| action: 'insert', | ||
| document: doc.toObject ? doc.toObject() : doc, | ||
| documentId: doc._id, | ||
| }); | ||
| }); | ||
|
|
||
| if (isDebug) console.log(`[DEBUG] bulk insert took ${(performance.now() - start).toFixed(2)}ms`); | ||
|
|
||
| return res.status(201).json({ | ||
| success: true, | ||
| insertedCount: result.length, | ||
| data: result, | ||
| }); | ||
| } catch (err) { | ||
| console.error(err); | ||
|
|
||
| if (isDuplicateKeyError(err)) { | ||
| return res.status(409).json({ | ||
| error: "Duplicate value violates unique constraint.", | ||
| details: err.message, | ||
| }); | ||
| } | ||
|
|
||
| res.status(500).json({ error: err.message }); | ||
| } | ||
| }; |
There was a problem hiding this comment.
Critical: ordered: false partial-success path silently loses state.
With ordered: false, insertMany throws a BulkWriteError on the first constraint violation while still inserting the remaining valid docs (they live on err.insertedDocs / err.result.insertedCount). The current catch maps that to a 409 Duplicate as if nothing was written, so for those successfully-persisted docs:
Project.databaseUsedis never incremented, so usage accounting drifts below reality.dispatchWebhooksis never called, so downstream consumers missinsertevents for real documents in the DB.- The client gets a response implying no write happened, even though some records are live.
Either switch to ordered: true (so nothing is inserted on any failure), or keep ordered: false and handle BulkWriteError explicitly: compute insertedDocs, increment databaseUsed by their real byte size, dispatch webhooks for each, and return a 207-style partial-success payload that lists failed indices/reasons.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/public-api/src/controllers/data.controller.js` around lines 167 - 205,
The current insertMany call (Model.insertMany(..., { ordered: false })) can
throw a BulkWriteError while still persisting some docs, but the catch treats it
like a full 409 duplicate and skips accounting/webhooks/response updates; either
change to ordered: true to guarantee all-or-nothing, or explicitly handle
BulkWriteError: inspect err.insertedDocs / err.result.insertedCount, increment
Project.updateOne({ $inc: { databaseUsed: <size of insertedDocs> } }) for only
those inserted docs, call dispatchWebhooks for each doc in err.insertedDocs (use
doc.toObject ? doc.toObject() : doc), and return a partial-success response
(e.g., status 207 or a 200 with inserted and failed lists and details) instead
of a simple 409; also keep existing isDuplicateKeyError handling for pure
duplicate errors.
There was a problem hiding this comment.
@yash-pouranik, no, this issue is not yet resolved in the current code. The bulkInsertData function still uses { ordered: false } and the catch block still only checks isDuplicateKeyError without any special handling for BulkWriteError.
Concretely, these three problems remain:
databaseUseddrift — partially-inserted docs never incrementProject.databaseUsed.- Missed webhooks —
dispatchWebhooksis never called for docs that were actually persisted. - Misleading client response — the client receives a
409implying nothing was written, even though some records are live in the DB.
To fix this, the catch block needs to explicitly handle BulkWriteError. Here's a concrete suggestion:
const result = await Model.insertMany(validData, { ordered: false });
// ... (success path unchanged)
} catch (err) {
console.error(err);
// BulkWriteError: some docs may have been inserted despite the error
if (err.name === 'MongoBulkWriteError' || err.name === 'BulkWriteError') {
const insertedDocs = err.insertedDocs ?? [];
if (!project.resources.db.isExternal && insertedDocs.length > 0) {
const insertedSize = Buffer.byteLength(JSON.stringify(insertedDocs));
await Project.updateOne(
{ _id: project._id },
{ $inc: { databaseUsed: insertedSize } },
);
}
insertedDocs.forEach((doc) => {
dispatchWebhooks({
projectId: project._id,
collection: collectionName,
action: 'insert',
document: doc.toObject ? doc.toObject() : doc,
documentId: doc._id,
});
});
return res.status(207).json({
success: false,
insertedCount: insertedDocs.length,
data: insertedDocs,
errors: err.writeErrors?.map((e) => ({
index: e.index,
message: e.errmsg,
})) ?? [],
});
}
if (isDuplicateKeyError(err)) {
return res.status(409).json({
error: "Duplicate value violates unique constraint.",
details: err.message,
});
}
res.status(500).json({ error: err.message });
}Alternatively, the simpler fix is to switch to ordered: true (all-or-nothing semantics), which avoids partial-success entirely and keeps the current catch logic correct.
🧠 Learnings used
Learnt from: CR
Repo: geturbackend/urBackend PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-21T21:00:02.427Z
Learning: When changing collection behavior, check: schema validation in `packages/common`, model compilation in `packages/common/src/utils/injectModel.js`, dashboard create collection UI, and dashboard-api collection creation/index creation flow.
| router.post( | ||
| '/:collectionName/bulk', | ||
| verifyApiKey, | ||
| blockUsersCollectionDataAccess, | ||
| checkUsageLimits, | ||
| resolvePublicAuthContext, | ||
| authorizeWriteOperation, | ||
| bulkInsertData | ||
| ); |
There was a problem hiding this comment.
Critical: authorizeWriteOperation is not array-aware — RLS is fully bypassed for bulk inserts.
The bulk route reuses authorizeWriteOperation, but that middleware treats req.body as a single object:
const incomingOwner = req.body?.[ownerField];
if (incomingOwner === undefined || incomingOwner === null || incomingOwner === '') {
req.body[ownerField] = authUserId; // on an Array this sets a property on the array, not on records
return next();
}
if (String(incomingOwner) !== authUserId) { ... } // never runs for arraysWith req.body being an array, incomingOwner is always undefined, so:
- The owner-mismatch check is skipped (a caller can embed any
ownerFieldvalue in each record and it will pass through). - Auto-assigning
authUserIdattaches it to the array object, not to each record, so the records are still missing the owner field (or retain whatever the caller sent).
Net effect: authenticated users can write records owned by other users via /bulk, a complete RLS bypass. Please update authorizeWriteOperation to branch on Array.isArray(req.body) and apply the owner check/auto-assignment to each element (and reject/normalize _id for each), before this route is shipped.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/public-api/src/routes/data.js` around lines 23 - 31,
authorizeWriteOperation currently assumes req.body is a single object; update it
to detect Array.isArray(req.body) and handle arrays by iterating over each
record: for each element, read ownerField, if missing/null/empty set it to
authUserId on the record (not on the array), and if present ensure
String(record[ownerField]) === authUserId else reject the request; additionally,
for array mode validate/normalize/_reject provided _id on each element the same
way you do for single-object mode (e.g., prevent clients from setting
conflicting IDs), and preserve existing single-object behavior when req.body is
not an array. Ensure the middleware still calls next() only if all records pass
these per-record checks.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
|
@Priyanka0205-CSE |
|
@yash-pouranik |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/public-api/src/middlewares/authorizeWriteOperation.js (1)
60-83: Include the offending item index in array-mode error responses.For array bodies, the loop fails fast on the first bad item or owner mismatch without telling the caller which index failed. The PR objective for
/bulkexplicitly aims to "return error messages for invalid records (including failed indices)", and the controller already does per-index reporting — surfacing the index here keeps middleware errors consistent and debuggable for bulk callers.Also, since callers can now legitimately send an array, consider whether an empty array should be rejected here or deferred to the controller (currently it silently passes through to
next()).♻️ Suggested tweak
- const bodyItems = Array.isArray(req.body) ? req.body : [req.body]; + const isBulk = Array.isArray(req.body); + const bodyItems = isBulk ? req.body : [req.body]; - for (const item of bodyItems) { + for (let i = 0; i < bodyItems.length; i++) { + const item = bodyItems[i]; if (!item || typeof item !== 'object' || Array.isArray(item)) { return res.status(400).json({ error: 'Invalid request body', - message: 'Request body must be an object or an array of objects.' + message: 'Request body must be an object or an array of objects.', + ...(isBulk ? { index: i } : {}) }); } const incomingOwner = item?.[ownerField]; if (incomingOwner === undefined || incomingOwner === null || incomingOwner === '') { item[ownerField] = authUserId; continue; } if (String(incomingOwner) !== authUserId) { return res.status(403).json({ error: 'RLS owner mismatch', - message: `You can only create documents with ${ownerField} equal to your user id.` + message: `You can only create documents with ${ownerField} equal to your user id.`, + ...(isBulk ? { index: i } : {}) }); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/public-api/src/middlewares/authorizeWriteOperation.js` around lines 60 - 83, The middleware authorizeWriteOperation iterates bodyItems but returns generic errors without the offending array index and also silently allows an empty array; update the loop and error responses so that when req.body is an array you include the index (i) in both the 400 "Invalid request body" response and the 403 "RLS owner mismatch" response (use bodyItems and ownerField/authUserId to build the message), and add explicit handling for an empty array (either reject with 400 and a clear message including that it’s empty, or defer to the controller by calling next()—choose and implement the desired behavior consistently).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@apps/public-api/src/middlewares/authorizeWriteOperation.js`:
- Around line 60-83: The middleware authorizeWriteOperation iterates bodyItems
but returns generic errors without the offending array index and also silently
allows an empty array; update the loop and error responses so that when req.body
is an array you include the index (i) in both the 400 "Invalid request body"
response and the 403 "RLS owner mismatch" response (use bodyItems and
ownerField/authUserId to build the message), and add explicit handling for an
empty array (either reject with 400 and a clear message including that it’s
empty, or defer to the controller by calling next()—choose and implement the
desired behavior consistently).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a7910006-7abb-447b-8b2b-5987b5b359ae
📒 Files selected for processing (1)
apps/public-api/src/middlewares/authorizeWriteOperation.js
|
@yash-pouranik Please take a look when you have time. Thank you for the feedback! |
|
@yash-pouranik Please review again. Thank you! |
|
@Priyanka0205-CSE |
| const result = await Model.insertMany(validData, { ordered: false }); | ||
|
|
||
| if (!project.resources.db.isExternal) { | ||
| await Project.updateOne( | ||
| { _id: project._id }, | ||
| { $inc: { databaseUsed: totalSize } }, | ||
| ); | ||
| } | ||
|
|
||
| result.forEach((doc) => { | ||
| dispatchWebhooks({ | ||
| projectId: project._id, | ||
| collection: collectionName, | ||
| action: 'insert', | ||
| document: doc.toObject ? doc.toObject() : doc, | ||
| documentId: doc._id, | ||
| }); | ||
| }); | ||
|
|
||
| if (isDebug) console.log(`[DEBUG] bulk insert took ${(performance.now() - start).toFixed(2)}ms`); | ||
|
|
||
| return res.status(201).json({ | ||
| success: true, | ||
| insertedCount: result.length, | ||
| data: result, | ||
| }); | ||
| } catch (err) { | ||
| console.error(err); | ||
|
|
||
| if (isDuplicateKeyError(err)) { | ||
| return res.status(409).json({ | ||
| error: "Duplicate value violates unique constraint.", | ||
| details: err.message, | ||
| }); | ||
| } | ||
|
|
||
| res.status(500).json({ error: err.message }); | ||
| } | ||
| }; |
| if (method === 'POST') { | ||
| const incomingOwner = req.body?.[ownerField]; | ||
|
|
There was a problem hiding this comment.
Why did u removed this - const incomingOwner = req.body?.[ownerField];
???
| router.post( | ||
| '/:collectionName/bulk', | ||
| verifyApiKey, | ||
| blockUsersCollectionDataAccess, | ||
| checkUsageLimits, | ||
| resolvePublicAuthContext, | ||
| authorizeWriteOperation, | ||
| bulkInsertData | ||
| ); |
|
@yash-pouranik I removed that line because with the bulk insert endpoint, So instead of reading Let me know if you’d prefer a different approach. |
|
@Priyanka0205-CSE ya thats okay. Please commit, I will check. |
|
I dont know on what what comments u have worked on. But in next commit please fix everything and betterto use the suggestions added. |
|
Thanks for the feedback! I’ve now addressed all the review comments and improved the validation logic and error messages for clarity and consistency. Please review again. Let me know if anything else needs improvement. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
apps/public-api/src/controllers/data.controller.js (1)
137-156: Optional: short-circuit validation when an invalid item is found.Since the contract is "any invalid item rejects the whole batch", the loop can return on the first failure instead of running
validateData+sanitizefor every remaining record (and uselessly populatingvalidData, which is then discarded). For 100 items where the first is malformed, you currently still pay 99 schema validations. Switching to afor...of(or plainfor) with an earlyreturn next(new AppError(...))is both faster and clearer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/public-api/src/controllers/data.controller.js` around lines 137 - 156, The loop currently uses incomingData.forEach and continues validating/sanitizing after an invalid item is found, populating validData and invalidIndices unnecessarily; change the iteration to a for...of (or plain for) over incomingData, validate each item with validateData(item, schemaRules), and as soon as an invalid item is detected call return next(new AppError(`Invalid records at index: ${index}`, 400)) to short-circuit and avoid further calls to validateData and sanitize, removing the accumulated invalidIndices/validData logic since the contract rejects the whole batch on first failure.apps/public-api/src/middlewares/authorizeWriteOperation.js (1)
60-92: Inconsistent indentation in the new POST block.The wrapping logic itself is correct (the
[req.body]wrapper preserves the reference, so mutations likeitem[ownerField] = authUserIdstill propagate to a single-object body), but lines 62‑92 mix 0/4/8/10‑space indents inside a block whose surrounding code uses 12/16. Please re-indent so the new block lines up with line 60 and the existingif (ownerField === '_id')block above.♻️ Suggested re-indent
const bodyItems = Array.isArray(req.body) ? req.body : [req.body]; - if (bodyItems.length === 0) { - return res.status(400).json({ - error: 'Invalid request body', - message: 'Request body cannot be an empty array.' - }); -} - -for (let i = 0; i < bodyItems.length; i++) { - const item = bodyItems[i]; - - if (!item || typeof item !== 'object' || Array.isArray(item)) { - return res.status(400).json({ - error: 'Invalid request body', - message: `Item at index ${i} must be a valid object` - }); - } - - const incomingOwner = item?.[ownerField]; - - if (incomingOwner === undefined || incomingOwner === null || incomingOwner === '') { - item[ownerField] = authUserId; - continue; - } - - if (String(incomingOwner) !== authUserId) { - return res.status(403).json({ - error: 'RLS owner mismatch', - message: `Item at index ${i} must have ${ownerField} equal to your user id` - }); - } -} + if (bodyItems.length === 0) { + return res.status(400).json({ + error: 'Invalid request body', + message: 'Request body cannot be an empty array.' + }); + } + + for (let i = 0; i < bodyItems.length; i++) { + const item = bodyItems[i]; + + if (!item || typeof item !== 'object' || Array.isArray(item)) { + return res.status(400).json({ + error: 'Invalid request body', + message: `Item at index ${i} must be a valid object` + }); + } + + const incomingOwner = item?.[ownerField]; + + if (incomingOwner === undefined || incomingOwner === null || incomingOwner === '') { + item[ownerField] = authUserId; + continue; + } + + if (String(incomingOwner) !== authUserId) { + return res.status(403).json({ + error: 'RLS owner mismatch', + message: `Item at index ${i} must have ${ownerField} equal to your user id` + }); + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/public-api/src/middlewares/authorizeWriteOperation.js` around lines 60 - 92, The new POST body-handling block inside authorizeWriteOperation is mis-indented versus the surrounding code; reformat the block that declares bodyItems and the subsequent for-loop/validation so its indentation aligns with the existing block that starts at the if (ownerField === '_id') statement (match the 12/16-space nesting used there). Keep the same logic and variable names (bodyItems, item, incomingOwner, ownerField, authUserId) but adjust each line's leading spaces so the entire block shares the same indentation level as the surrounding code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/public-api/src/controllers/data.controller.js`:
- Around line 175-178: The catch block in the bulk-insert handler currently
always returns a 500; change it to detect duplicate-key errors (use
isDuplicateKeyError(err)) and route those to a 409 using AppError (do not expose
err.message), mirroring the pattern used by insertData and updateSingleData: if
isDuplicateKeyError(err) return next(new AppError("Duplicate key error", 409));
otherwise fall through to the existing next(new AppError("Failed to insert bulk
data", 500)); ensure you reference the same next and AppError usage as in this
controller.
- Around line 100-179: bulkInsertData currently skips quota checks, usage
accounting, and webhook dispatch present in insertData; fix by adding the same
pre-flight quota validation (check project.databaseUsed + totalSize >
project.databaseLimit) before calling Model.insertMany, incrementing
Project.databaseUsed by the total inserted document size and persisting the
project after successful insertMany, and dispatching dispatchWebhooks for each
inserted record (use action: 'insert' and include project/_id/collection info)
after insertion; locate and mirror the relevant logic from insertData (the
blocks around Model.create, Project.databaseUsed updates, and dispatchWebhooks)
and adapt them to work with Model.insertMany and the array of validData returned
by insertMany.
---
Nitpick comments:
In `@apps/public-api/src/controllers/data.controller.js`:
- Around line 137-156: The loop currently uses incomingData.forEach and
continues validating/sanitizing after an invalid item is found, populating
validData and invalidIndices unnecessarily; change the iteration to a for...of
(or plain for) over incomingData, validate each item with validateData(item,
schemaRules), and as soon as an invalid item is detected call return next(new
AppError(`Invalid records at index: ${index}`, 400)) to short-circuit and avoid
further calls to validateData and sanitize, removing the accumulated
invalidIndices/validData logic since the contract rejects the whole batch on
first failure.
In `@apps/public-api/src/middlewares/authorizeWriteOperation.js`:
- Around line 60-92: The new POST body-handling block inside
authorizeWriteOperation is mis-indented versus the surrounding code; reformat
the block that declares bodyItems and the subsequent for-loop/validation so its
indentation aligns with the existing block that starts at the if (ownerField ===
'_id') statement (match the 12/16-space nesting used there). Keep the same logic
and variable names (bodyItems, item, incomingOwner, ownerField, authUserId) but
adjust each line's leading spaces so the entire block shares the same
indentation level as the surrounding code.
🪄 Autofix (Beta)
❌ Autofix failed (check again to retry)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 96857de1-258b-425a-9531-10ed713a603f
📒 Files selected for processing (2)
apps/public-api/src/controllers/data.controller.jsapps/public-api/src/middlewares/authorizeWriteOperation.js
| // BULK INSERT DATA | ||
|
|
||
| module.exports.bulkInsertData = async (req, res, next) => { | ||
| try { | ||
| const MAX_BULK_INSERT_LIMIT = 100; | ||
|
|
||
| const { collectionName } = req.params; | ||
| const project = req.project; | ||
| const incomingData = req.body; | ||
|
|
||
| if (!Array.isArray(incomingData)) { | ||
| return next(new AppError("Request body must be an array of objects", 400)); | ||
| } | ||
|
|
||
| if (incomingData.length === 0) { | ||
| return next(new AppError("Request body cannot be empty", 400)); | ||
| } | ||
|
|
||
| if (incomingData.length > MAX_BULK_INSERT_LIMIT) { | ||
| return next( | ||
| new AppError(`Maximum ${MAX_BULK_INSERT_LIMIT} records allowed`, 400) | ||
| ); | ||
| } | ||
|
|
||
| const collectionConfig = project.collections.find( | ||
| (c) => c.name === collectionName | ||
| ); | ||
|
|
||
| if (!collectionConfig) { | ||
| return next(new AppError("Collection not found", 404)); | ||
| } | ||
|
|
||
| const schemaRules = collectionConfig.model; | ||
|
|
||
| const validData = []; | ||
| const invalidIndices = []; | ||
|
|
||
| incomingData.forEach((item, index) => { | ||
| if (!item || typeof item !== "object" || Array.isArray(item)) { | ||
| invalidIndices.push(index); | ||
| return; | ||
| } | ||
|
|
||
| const { error, cleanData } = validateData(item, schemaRules); | ||
|
|
||
| if (error) { | ||
| invalidIndices.push(index); | ||
| } else { | ||
| validData.push(sanitize(cleanData)); | ||
| } | ||
| }); | ||
|
|
||
| if (invalidIndices.length > 0) { | ||
| return next( | ||
| new AppError(`Invalid records at index: ${invalidIndices.join(", ")}`, 400) | ||
| ); | ||
| } | ||
|
|
||
| const connection = await getConnection(project._id); | ||
| const Model = getCompiledModel( | ||
| connection, | ||
| collectionConfig, | ||
| project._id, | ||
| project.resources.db.isExternal | ||
| ); | ||
|
|
||
| const result = await Model.insertMany(validData, { ordered: true }); | ||
|
|
||
| return res.status(201).json({ | ||
| success: true, | ||
| data: { | ||
| insertedCount: result.length, | ||
| }, | ||
| message: "Bulk insert successful", | ||
| }); | ||
| } catch (err) { | ||
| console.error(err); | ||
| return next(new AppError("Failed to insert bulk data", 500)); | ||
| } | ||
| }; |
There was a problem hiding this comment.
Critical: bulk handler skips quota enforcement, usage accounting, and webhook dispatch.
bulkInsertData doesn't replicate the three side effects that insertData performs around Model.create (lines 51‑57, 69‑74, 76‑82). Compared to a 100x single-insert flow, this bulk path silently:
- Bypasses the database limit — there is no pre-flight
project.databaseUsed + size > project.databaseLimitcheck, so a caller can drive any project past its plan quota in one request. - Loses usage accounting —
Project.databaseUsedis never incremented afterinsertMany, so subsequent quota checks underestimate real storage usage indefinitely (drift is permanent until reseeded). - Skips webhooks —
dispatchWebhooks({ action: 'insert', ... })is never called for any of the inserted docs, so downstream consumers missinsertevents for real records that exist in the DB. This silently breaks integrations (zaps, audit pipelines, sync mirrors, etc.).
All three are reachable on the happy path. Please mirror the insertData handler:
🛡️ Suggested fix (parity with insertData)
if (invalidIndices.length > 0) {
return next(
new AppError(`Invalid records at index: ${invalidIndices.join(", ")}`, 400)
);
}
+ let totalSize = 0;
+ if (!project.resources.db.isExternal) {
+ totalSize = Buffer.byteLength(JSON.stringify(validData));
+ if ((project.databaseUsed || 0) + totalSize > project.databaseLimit) {
+ return next(new AppError("Database limit exceeded.", 403));
+ }
+ }
+
const connection = await getConnection(project._id);
const Model = getCompiledModel(
connection,
collectionConfig,
project._id,
project.resources.db.isExternal
);
const result = await Model.insertMany(validData, { ordered: true });
+ if (!project.resources.db.isExternal) {
+ await Project.updateOne(
+ { _id: project._id },
+ { $inc: { databaseUsed: totalSize } },
+ );
+ }
+
+ result.forEach((doc) => {
+ dispatchWebhooks({
+ projectId: project._id,
+ collection: collectionName,
+ action: 'insert',
+ document: doc.toObject ? doc.toObject() : doc,
+ documentId: doc._id,
+ });
+ });
+
return res.status(201).json({🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/public-api/src/controllers/data.controller.js` around lines 100 - 179,
bulkInsertData currently skips quota checks, usage accounting, and webhook
dispatch present in insertData; fix by adding the same pre-flight quota
validation (check project.databaseUsed + totalSize > project.databaseLimit)
before calling Model.insertMany, incrementing Project.databaseUsed by the total
inserted document size and persisting the project after successful insertMany,
and dispatching dispatchWebhooks for each inserted record (use action: 'insert'
and include project/_id/collection info) after insertion; locate and mirror the
relevant logic from insertData (the blocks around Model.create,
Project.databaseUsed updates, and dispatchWebhooks) and adapt them to work with
Model.insertMany and the array of validData returned by insertMany.
| } catch (err) { | ||
| console.error(err); | ||
| return next(new AppError("Failed to insert bulk data", 500)); | ||
| } |
There was a problem hiding this comment.
Catch block masks duplicate-key errors as 500.
Every other write handler in this file (insertData line 89, updateSingleData line 460) routes isDuplicateKeyError(err) to a 409. Here the catch unconditionally returns a generic 500 "Failed to insert bulk data", so a single unique-constraint violation in the batch becomes indistinguishable from an actual server error — clients can't disambiguate "fix your payload" from "retry later", and ordered:true means the failing index is recoverable info you're discarding.
Please branch on isDuplicateKeyError(err) (without exposing err.message, per the controller guideline) before falling through to the generic 500.
🛡️ Suggested fix
} catch (err) {
console.error(err);
+ if (isDuplicateKeyError(err)) {
+ return next(new AppError("Duplicate value violates unique constraint.", 409));
+ }
return next(new AppError("Failed to insert bulk data", 500));
}As per coding guidelines: "All API endpoints return: { success: bool, data: {}, message: "" }. Use AppError class for errors — never raw throw, never expose MongoDB errors to client."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/public-api/src/controllers/data.controller.js` around lines 175 - 178,
The catch block in the bulk-insert handler currently always returns a 500;
change it to detect duplicate-key errors (use isDuplicateKeyError(err)) and
route those to a 409 using AppError (do not expose err.message), mirroring the
pattern used by insertData and updateSingleData: if isDuplicateKeyError(err)
return next(new AppError("Duplicate key error", 409)); otherwise fall through to
the existing next(new AppError("Failed to insert bulk data", 500)); ensure you
reference the same next and AppError usage as in this controller.
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. The branch was updated while autofix was in progress. Please try again. |
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. The branch was updated while autofix was in progress. Please try again. |
|
@Priyanka0205-CSE
also please update ur branch with urbackend's Main. |
|
@Priyanka0205-CSE coderabbit comments are still not resolved |
|
@Priyanka0205-CSE |
What’s this PR about?
This PR introduces a new bulk insert endpoint that allows multiple records to be added in a single request. The goal is to improve performance by reducing the number of API calls required for large data operations.
What I implemented
Added a new endpoint:
/api/data/:collectionName/bulkImplemented
bulkInsertDatausing MongoDB’sinsertManyfor efficient insertionAdded validation to ensure the request body is an array of valid objects
Returned proper error messages for invalid records (including failed indices)
Integrated existing middlewares (API key validation, usage limits, auth, etc.)
Why this is useful
Supports high-volume data operations (100+ records in one request)
Reduces network overhead and improves performance
Useful for migrations, batch uploads, and analytics use cases
How I tested it
401 Unauthorizeddue to API key validation, confirming:✔ route is correctly configured
✔ endpoint is reachable
Notes
Summary by CodeRabbit
New Features
Behavior
Bug Fixes
Refactor