Skip to content

feat: bulk document import via executeBulkOperations#3145

Open
bk201- wants to merge 2 commits into
mainfrom
dev/dshilov/bulk-import-documents
Open

feat: bulk document import via executeBulkOperations#3145
bk201- wants to merge 2 commits into
mainfrom
dev/dshilov/bulk-import-documents

Conversation

@bk201-

@bk201- bk201- commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces the per-document items.create loop in importDocuments with chunked bulk insertion via executeBulkOperations (the modern, non-deprecated @azure/cosmos API — same pattern used in DocumentSession.processBulkDeleteBatch).

Changes

  • Bulk insert documents in chunks of 100 via items.executeBulkOperations.
  • Throttling-aware: throttled (429) documents are retried up to 10 times, honoring x-ms-retry-after-ms with backoff.
  • Phase-2 fallback: any document that fails the bulk path (incl. emulators that do not support executeBulkOperations) is retried one-by-one.
  • Auto-generates id (UUID) for documents missing one.
  • Cancellable progress (cancellable: true + CancellationToken).
  • Correct partition key extraction via extractPartitionKey.

Why not #2692

This supersedes #2692 for the CosmosDB NoSQL path and that PR can be closed:

  • Import Document with Buffer #2692 used the deprecated items.bulk() (limit 100, no NoSQL throttling handling).
  • Its generic auto-flush buffer was over-engineered for the in-memory case (all documents are already materialized into a single array before insert, so simple chunking is enough).
  • It was based on an outdated import architecture (getCosmosClient + Mongo branch), whereas import is now CosmosDB-only and uses withClaimsChallengeHandling.

Follow-up

The buffering idea from #2692 is valuable for a streaming import scenario (large JSONL/NDJSON files that should not be loaded fully into memory). Tracked separately in #3144.

@bk201- bk201- requested a review from a team as a code owner June 11, 2026 09:08
@github-actions

Copy link
Copy Markdown
Contributor

🎭 E2E Tests (Playwright + VS Code)

Commit: 6dd2939
Pull Request: #3145 feat: bulk document import via executeBulkOperations

🧪 Result

  • E2E Tests: ✅ success

📥 Artifacts (run)

Tip: the HTML report artifact contains a self-contained Playwright report.
Download the zip, extract, and open index.html — or run
npx playwright show-report <extracted-dir> for the interactive view.

@github-actions

Copy link
Copy Markdown
Contributor

🔨 Build, Lint & Test

🔗 Source

📦 Package Information

🧪 Test Results

  • Unit Tests: ✅ success
  • Integration Tests (extension host): ✅ success

📥 Artifacts (run)

✅ Build Status

Build and local tests passed. See sibling comments below for E2E and NoSQL integration results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant