fix: chunk SEO IN clause to stay within D1 SQL variable limit by baezor · Pull Request #422 · emdash-cms/emdash

baezor · 2026-04-09T23:47:46Z

Follow-up to #223 — same root cause, different repository.

Note on the diff: this PR is branched on top of fix/byline-sql-batch-chunking (#223) because it reuses the chunks() helper and SQL_BATCH_SIZE constant introduced there. Until #223 merges, the diff here will include #223's changes too. Once #223 lands, I'll rebase onto main and the diff will collapse to just the SEO fix and its tests. Please land #223 first.

What does this PR do?

Fixes the same unbounded IN (?, ?, …) pattern — this time in SeoRepository.getMany — that trips Cloudflare D1's 100-parameter limit on collections with ≥99 items.

SeoRepository.getMany builds:

.selectFrom("_emdash_seo")
.where("collection", "=", collection)       // 1 bound param
.where("content_id", "in", contentIds)      // N bound params

With 100 content ids that's 101 bound params → D1 rejects with:

Content list error: Error: D1_ERROR: too many SQL variables at offset 369: SQLITE_ERROR

This matters because SeoRepository.getMany is called from handleContentList before hydrateBylinesMany:

// packages/core/src/api/handlers/content.ts:257-259
const hasSeo = await collectionHasSeo(db, collection);
await hydrateSeoMany(db, collection, result.items, hasSeo);
await hydrateBylinesMany(db, collection, result.items);

So on any collection with has_seo = 1 and ≥99 items, it's the first function to fail on the admin content list endpoint — #223 on its own would still leave those installs broken. I hit this at prepain.mx with 124 posts and has_seo = 1 on the posts collection; see the thread on #223 for the field repro.

Changes specific to this PR (on top of #223)

packages/core/src/database/repositories/seo.ts
- Import chunks, SQL_BATCH_SIZE from ../../utils/chunks.js
- Dedupe contentIds before chunking (consistent with getContentBylinesMany in fix: chunk byline IN clauses to stay within D1 SQL variable limit #223)
- Pre-fill result with defaults, then override per-chunk — single pass, no intermediate map
- Updated JSDoc noting the D1 motivation
packages/core/tests/unit/database/repositories/seo.test.ts (new)
- getMany handles more IDs than SQL_BATCH_SIZE — real ids spread across multiple chunks
- getMany returns defaults for every input id when no rows exist
- getMany deduplicates repeated content IDs without duplicate rows
.changeset/brave-seals-hydrate.md (patch)

Type of change

Checklist

All 76 unit tests in tests/unit/database/repositories, tests/unit/utils, tests/unit/bylines pass (chunks 7 + seo 3 new + byline 8 + bylines-query 11 + content 47)
Added tests for my changes
Added a changeset
Matches the IN chunking pattern established by fix: chunk byline IN clauses to stay within D1 SQL variable limit #223

AI-generated code disclosure

This PR includes AI-generated code (drafted in-session while debugging a production incident, reviewed and tested locally)

Repro

wrangler d1 execute <db> --remote --command \
  "SELECT 1 WHERE 'x' = ? AND 1 IN (?,?,…×100)"
# → too many SQL variables at offset 231: SQLITE_ERROR [code: 7500]

Test output (SEO fix + PR #223 byline suite)

 ✓ tests/unit/utils/chunks.test.ts (7 tests) 2ms
 ✓ tests/unit/database/repositories/seo.test.ts (3 tests) 68ms
 ✓ tests/unit/database/repositories/byline.test.ts (8 tests) 141ms
 ✓ tests/unit/bylines/bylines-query.test.ts (11 tests) 197ms
 ✓ tests/unit/database/repositories/content.test.ts (47 tests) 665ms

 Test Files  5 passed (5)
      Tests  76 passed (76)

Fixes emdash-cms#219. hydrateEntryBylines builds unbounded IN (?, ?, …) clauses that exceed Cloudflare D1's bound-parameter limit on large collections. Adds a chunks() utility and applies it defense-in-depth at the repository level: getContentBylinesMany, findByUserIds, and getAuthorIds now batch IDs in groups of 50.

Deduplicates contentIds in getContentBylinesMany to prevent duplicate credits when the same ID appears across chunk boundaries. Adds tests for the duplication edge case and an end-to-end getBylinesForEntries test spanning both explicit and inferred byline paths.

changeset-bot · 2026-04-09T23:47:51Z

🦋 Changeset detected

Latest commit: d8a56cc

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages

Name	Type
emdash	Patch
@emdash-cms/cloudflare	Patch
@emdash-cms/plugin-ai-moderation	Patch
@emdash-cms/plugin-atproto	Patch
@emdash-cms/plugin-audit-log	Patch
@emdash-cms/plugin-color	Patch
@emdash-cms/plugin-embeds	Patch
@emdash-cms/plugin-forms	Patch
@emdash-cms/plugin-webhook-notifier	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

github-actions · 2026-04-09T23:47:58Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

Copilot

Pull request overview

This PR addresses Cloudflare D1’s bound-parameter limit by chunking large IN (...) queries (notably SEO hydration via SeoRepository.getMany), building on the chunking utilities and byline batching introduced in #223.

Changes:

Added chunks() + SQL_BATCH_SIZE utility to cap IN (...) batch sizes.
Updated SeoRepository.getMany to dedupe IDs and query in chunks while returning defaults for missing rows.
Added/updated unit tests covering chunking behavior and large-batch SEO/byline hydration, plus patch changesets.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
packages/core/src/utils/chunks.ts	Introduces chunking helper and batch-size constant for SQL `IN (...)` queries.
packages/core/src/database/repositories/seo.ts	Chunks `content_id IN (...)` in `getMany` to avoid D1 SQL variable limit.
packages/core/src/database/repositories/byline.ts	Chunks `IN (...)` queries for byline hydration (from #223).
packages/core/src/bylines/index.ts	Chunks raw SQL `id IN (...)` author-id lookup (from #223).
packages/core/tests/unit/utils/chunks.test.ts	Adds unit tests for chunk helper + `SQL_BATCH_SIZE`.
packages/core/tests/unit/database/repositories/seo.test.ts	New tests for chunked SEO hydration behavior.
packages/core/tests/unit/database/repositories/byline.test.ts	Adds tests for chunked byline repository methods (from #223).
packages/core/tests/unit/bylines/bylines-query.test.ts	Adds higher-volume bylines query test (from #223).
.changeset/cute-eagles-rescue.md	Changeset for byline/D1 batching fix (from #223).
.changeset/brave-seals-hydrate.md	Changeset for SEO/D1 batching fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T23:50:19Z

packages/core/src/utils/chunks.ts

+export function chunks<T>(arr: T[], size: number): T[][] {
+	if (arr.length === 0) return [];
+	const result: T[][] = [];
+	for (let i = 0; i < arr.length; i += size) {
+		result.push(arr.slice(i, i + size));
+	}


chunks() can enter an infinite loop when called with size <= 0 because i += size will never advance (or will go backwards). Since this helper is exported, add input validation (e.g., throw for non-positive sizes, or clamp to 1) to prevent hangs at runtime.

This function is from #223 (the base branch this PR stacks on), not from this PR's diff. The concern is valid in the abstract but in practice the only call sites use the exported SQL_BATCH_SIZE constant (50), so size <= 0 can't happen at runtime. If the maintainers want a guard, it belongs in #223 alongside the function definition.

SeoRepository.getMany builds a WHERE content_id IN (?, ?, ...) clause alongside a collection = ? filter. On Cloudflare D1, which caps bound parameters at 100 per query, passing 100 content ids produces 101 parameters and trips the limit: D1_ERROR: too many SQL variables at offset 369: SQLITE_ERROR This is the same root cause as the byline hydration fix in the sibling commit, but on a different repository that wasn't covered there. SeoRepository.getMany is called from handleContentList before hydrateBylinesMany, so on any collection with has_seo = 1 and >= 99 items, it's the first function to fail on the admin content list endpoint. Apply the same chunking pattern using the shared chunks() helper and SQL_BATCH_SIZE constant. Deduplicate contentIds before chunking for consistency with the byline fix. Pre-fill result with defaults so the two-pass resolve-then-fill-missing logic collapses to a single pass. Adds unit tests covering: - input size larger than SQL_BATCH_SIZE, real ids spread across chunks - all-missing ids get defaults - duplicate input ids resolve cleanly without duplicate rows Repro of the underlying D1 limit for the record: wrangler d1 execute <db> --remote --command \ "SELECT 1 WHERE 'x' = ? AND 1 IN (?,?,...x100)" -> too many SQL variables at offset 231: SQLITE_ERROR [code: 7500]

github-actions · 2026-04-10T03:24:02Z

Overlapping PRs

This PR modifies files that are also changed by other open PRs:

fix: chunk byline IN clauses to stay within D1 SQL variable limit #223 (7 shared files)

This may cause merge conflicts or duplicated work. A maintainer will coordinate.

baezor added 4 commits April 4, 2026 01:26

chore: add changeset for byline chunking fix

7090589

Merge branch 'main' into fix/byline-sql-batch-chunking

fddf1ee

Copilot AI review requested due to automatic review settings April 9, 2026 23:47

github-actions bot added area/core size/L cla: needed labels Apr 9, 2026

Copilot started reviewing on behalf of baezor April 9, 2026 23:48 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

baezor and others added 2 commits April 9, 2026 18:51

style: format

d8a56cc

baezor force-pushed the fix/seo-sql-batch-chunking branch from fe5e01b to d8a56cc Compare April 9, 2026 23:52

github-actions bot added cla: signed overlap and removed cla: needed labels Apr 9, 2026

github-actions bot mentioned this pull request Apr 10, 2026

fix: chunk byline IN clauses to stay within D1 SQL variable limit #223

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: chunk SEO IN clause to stay within D1 SQL variable limit#422

fix: chunk SEO IN clause to stay within D1 SQL variable limit#422
baezor wants to merge 6 commits intoemdash-cms:mainfrom
baezor:fix/seo-sql-batch-chunking

baezor commented Apr 9, 2026

Uh oh!

changeset-bot bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

baezor Apr 9, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

baezor commented Apr 9, 2026

What does this PR do?

Changes specific to this PR (on top of #223)

Type of change

Checklist

AI-generated code disclosure

Repro

Test output (SEO fix + PR #223 byline suite)

Uh oh!

changeset-bot bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

github-actions bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

baezor Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 10, 2026

Overlapping PRs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading