fix: chunk byline IN clauses to stay within D1 SQL variable limit#223
fix: chunk byline IN clauses to stay within D1 SQL variable limit#223baezor wants to merge 4 commits intoemdash-cms:mainfrom
Conversation
Fixes emdash-cms#219. hydrateEntryBylines builds unbounded IN (?, ?, …) clauses that exceed Cloudflare D1's bound-parameter limit on large collections. Adds a chunks() utility and applies it defense-in-depth at the repository level: getContentBylinesMany, findByUserIds, and getAuthorIds now batch IDs in groups of 50.
Deduplicates contentIds in getContentBylinesMany to prevent duplicate credits when the same ID appears across chunk boundaries. Adds tests for the duplication edge case and an end-to-end getBylinesForEntries test spanning both explicit and inferred byline paths.
🦋 Changeset detectedLatest commit: fddf1ee The changes in this PR will be included in the next version bump. This PR includes changesets to release 9 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
|
Confirming this addresses a real production bug — we just hit it at prepain.mx with Minimal standalone repro of the D1 limit itself (just to isolate the constraint from emdash): wrangler d1 execute <db> --remote --command \
"SELECT 1 WHERE 'x' = ? AND 1 IN (?,?,…×100)"
# → too many SQL variables at offset 231: SQLITE_ERROR [code: 7500]Per the D1 limits docs, D1 caps bound parameters at 100 per query — far below SQLite upstream's default of 32,766. Any Downstream (React #300 "rendered fewer hooks than expected") is a symptom — the admin bundle throws on the 500 response and the hook count goes inconsistent during the re-render. Fixing the query fixes the crash. Heads-up: one uncovered siteWhile tracing the failure in prod we noticed that const rows = await this.db
.selectFrom("_emdash_seo")
.selectAll()
.where("collection", "=", collection)
.where("content_id", "in", contentIds) // ← 100 ids + 1 collection = 101 params
.execute();And it's called from // packages/core/src/api/handlers/content.ts:257-259
const hasSeo = await collectionHasSeo(db, collection);
await hydrateSeoMany(db, collection, result.items, hasSeo);
await hydrateBylinesMany(db, collection, result.items);So any collection with Happy to open a follow-up PR that reuses the new (As a short-term unblock we shipped a local |
Overlapping PRsThis PR modifies files that are also changed by other open PRs:
This may cause merge conflicts or duplicated work. A maintainer will coordinate. |
What does this PR do?
Fixes unbounded
IN (?, ?, …)clauses in byline hydration that exceed Cloudflare D1's SQL bound-parameter limit when querying large collections.Adds a
chunks()utility (utils/chunks.ts) and applies it defense-in-depth at the repository level so any caller is protected:BylineRepository.getContentBylinesMany— deduplicates IDs, then chunkscontent_id IN (…)BylineRepository.findByUserIds— chunksuser_id IN (…)getAuthorIds(bylines/index.ts) — chunksid IN (…)raw SQLEach batch is capped at 50 IDs, well within D1's limit. Content IDs are also deduplicated before chunking to prevent duplicate credits when the same ID spans multiple chunks.
Closes #219
Type of change
Checklist
pnpm typecheckpassespnpm --silent lint:json | jq '.diagnostics | length'returns 0pnpm testpasses (or targeted tests for my change)pnpm formathas been runAI-generated code disclosure
Screenshots / test output
All 26 tests pass across 3 test files (10 new):