fix: chunk SEO IN clause to stay within D1 SQL variable limit#422
fix: chunk SEO IN clause to stay within D1 SQL variable limit#422baezor wants to merge 6 commits intoemdash-cms:mainfrom
Conversation
Fixes emdash-cms#219. hydrateEntryBylines builds unbounded IN (?, ?, …) clauses that exceed Cloudflare D1's bound-parameter limit on large collections. Adds a chunks() utility and applies it defense-in-depth at the repository level: getContentBylinesMany, findByUserIds, and getAuthorIds now batch IDs in groups of 50.
Deduplicates contentIds in getContentBylinesMany to prevent duplicate credits when the same ID appears across chunk boundaries. Adds tests for the duplication edge case and an end-to-end getBylinesForEntries test spanning both explicit and inferred byline paths.
🦋 Changeset detectedLatest commit: d8a56cc The changes in this PR will be included in the next version bump. This PR includes changesets to release 9 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
All contributors have signed the CLA ✍️ ✅ |
There was a problem hiding this comment.
Pull request overview
This PR addresses Cloudflare D1’s bound-parameter limit by chunking large IN (...) queries (notably SEO hydration via SeoRepository.getMany), building on the chunking utilities and byline batching introduced in #223.
Changes:
- Added
chunks()+SQL_BATCH_SIZEutility to capIN (...)batch sizes. - Updated
SeoRepository.getManyto dedupe IDs and query in chunks while returning defaults for missing rows. - Added/updated unit tests covering chunking behavior and large-batch SEO/byline hydration, plus patch changesets.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/utils/chunks.ts | Introduces chunking helper and batch-size constant for SQL IN (...) queries. |
| packages/core/src/database/repositories/seo.ts | Chunks content_id IN (...) in getMany to avoid D1 SQL variable limit. |
| packages/core/src/database/repositories/byline.ts | Chunks IN (...) queries for byline hydration (from #223). |
| packages/core/src/bylines/index.ts | Chunks raw SQL id IN (...) author-id lookup (from #223). |
| packages/core/tests/unit/utils/chunks.test.ts | Adds unit tests for chunk helper + SQL_BATCH_SIZE. |
| packages/core/tests/unit/database/repositories/seo.test.ts | New tests for chunked SEO hydration behavior. |
| packages/core/tests/unit/database/repositories/byline.test.ts | Adds tests for chunked byline repository methods (from #223). |
| packages/core/tests/unit/bylines/bylines-query.test.ts | Adds higher-volume bylines query test (from #223). |
| .changeset/cute-eagles-rescue.md | Changeset for byline/D1 batching fix (from #223). |
| .changeset/brave-seals-hydrate.md | Changeset for SEO/D1 batching fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export function chunks<T>(arr: T[], size: number): T[][] { | ||
| if (arr.length === 0) return []; | ||
| const result: T[][] = []; | ||
| for (let i = 0; i < arr.length; i += size) { | ||
| result.push(arr.slice(i, i + size)); | ||
| } |
There was a problem hiding this comment.
chunks() can enter an infinite loop when called with size <= 0 because i += size will never advance (or will go backwards). Since this helper is exported, add input validation (e.g., throw for non-positive sizes, or clamp to 1) to prevent hangs at runtime.
There was a problem hiding this comment.
This function is from #223 (the base branch this PR stacks on), not from this PR's diff. The concern is valid in the abstract but in practice the only call sites use the exported SQL_BATCH_SIZE constant (50), so size <= 0 can't happen at runtime. If the maintainers want a guard, it belongs in #223 alongside the function definition.
SeoRepository.getMany builds a WHERE content_id IN (?, ?, ...)
clause alongside a collection = ? filter. On Cloudflare D1, which
caps bound parameters at 100 per query, passing 100 content ids
produces 101 parameters and trips the limit:
D1_ERROR: too many SQL variables at offset 369: SQLITE_ERROR
This is the same root cause as the byline hydration fix in the
sibling commit, but on a different repository that wasn't covered
there. SeoRepository.getMany is called from handleContentList
before hydrateBylinesMany, so on any collection with has_seo = 1
and >= 99 items, it's the first function to fail on the admin
content list endpoint.
Apply the same chunking pattern using the shared chunks() helper
and SQL_BATCH_SIZE constant. Deduplicate contentIds before
chunking for consistency with the byline fix. Pre-fill result
with defaults so the two-pass resolve-then-fill-missing logic
collapses to a single pass.
Adds unit tests covering:
- input size larger than SQL_BATCH_SIZE, real ids spread across chunks
- all-missing ids get defaults
- duplicate input ids resolve cleanly without duplicate rows
Repro of the underlying D1 limit for the record:
wrangler d1 execute <db> --remote --command \
"SELECT 1 WHERE 'x' = ? AND 1 IN (?,?,...x100)"
-> too many SQL variables at offset 231: SQLITE_ERROR [code: 7500]
fe5e01b to
d8a56cc
Compare
Overlapping PRsThis PR modifies files that are also changed by other open PRs: This may cause merge conflicts or duplicated work. A maintainer will coordinate. |
Follow-up to #223 — same root cause, different repository.
What does this PR do?
Fixes the same unbounded
IN (?, ?, …)pattern — this time inSeoRepository.getMany— that trips Cloudflare D1's 100-parameter limit on collections with ≥99 items.SeoRepository.getManybuilds:With 100 content ids that's 101 bound params → D1 rejects with:
This matters because
SeoRepository.getManyis called fromhandleContentListbeforehydrateBylinesMany:So on any collection with
has_seo = 1and ≥99 items, it's the first function to fail on the admin content list endpoint — #223 on its own would still leave those installs broken. I hit this at prepain.mx with 124 posts andhas_seo = 1on the posts collection; see the thread on #223 for the field repro.Changes specific to this PR (on top of #223)
packages/core/src/database/repositories/seo.tschunks, SQL_BATCH_SIZEfrom../../utils/chunks.jscontentIdsbefore chunking (consistent withgetContentBylinesManyin fix: chunk byline IN clauses to stay within D1 SQL variable limit #223)resultwith defaults, then override per-chunk — single pass, no intermediate mappackages/core/tests/unit/database/repositories/seo.test.ts(new)getMany handles more IDs than SQL_BATCH_SIZE— real ids spread across multiple chunksgetMany returns defaults for every input id when no rows existgetMany deduplicates repeated content IDs without duplicate rows.changeset/brave-seals-hydrate.md(patch)Type of change
Checklist
tests/unit/database/repositories,tests/unit/utils,tests/unit/bylinespass (chunks 7 + seo 3 new + byline 8 + bylines-query 11 + content 47)INchunking pattern established by fix: chunk byline IN clauses to stay within D1 SQL variable limit #223AI-generated code disclosure
Repro
Test output (SEO fix + PR #223 byline suite)