Skip to content

fix: chunk SEO IN clause to stay within D1 SQL variable limit#422

Open
baezor wants to merge 6 commits intoemdash-cms:mainfrom
baezor:fix/seo-sql-batch-chunking
Open

fix: chunk SEO IN clause to stay within D1 SQL variable limit#422
baezor wants to merge 6 commits intoemdash-cms:mainfrom
baezor:fix/seo-sql-batch-chunking

Conversation

@baezor
Copy link
Copy Markdown

@baezor baezor commented Apr 9, 2026

Follow-up to #223 — same root cause, different repository.

Note on the diff: this PR is branched on top of fix/byline-sql-batch-chunking (#223) because it reuses the chunks() helper and SQL_BATCH_SIZE constant introduced there. Until #223 merges, the diff here will include #223's changes too. Once #223 lands, I'll rebase onto main and the diff will collapse to just the SEO fix and its tests. Please land #223 first.

What does this PR do?

Fixes the same unbounded IN (?, ?, …) pattern — this time in SeoRepository.getMany — that trips Cloudflare D1's 100-parameter limit on collections with ≥99 items.

SeoRepository.getMany builds:

.selectFrom("_emdash_seo")
.where("collection", "=", collection)       // 1 bound param
.where("content_id", "in", contentIds)      // N bound params

With 100 content ids that's 101 bound params → D1 rejects with:

Content list error: Error: D1_ERROR: too many SQL variables at offset 369: SQLITE_ERROR

This matters because SeoRepository.getMany is called from handleContentList before hydrateBylinesMany:

// packages/core/src/api/handlers/content.ts:257-259
const hasSeo = await collectionHasSeo(db, collection);
await hydrateSeoMany(db, collection, result.items, hasSeo);
await hydrateBylinesMany(db, collection, result.items);

So on any collection with has_seo = 1 and ≥99 items, it's the first function to fail on the admin content list endpoint — #223 on its own would still leave those installs broken. I hit this at prepain.mx with 124 posts and has_seo = 1 on the posts collection; see the thread on #223 for the field repro.

Changes specific to this PR (on top of #223)

  • packages/core/src/database/repositories/seo.ts
  • packages/core/tests/unit/database/repositories/seo.test.ts (new)
    • getMany handles more IDs than SQL_BATCH_SIZE — real ids spread across multiple chunks
    • getMany returns defaults for every input id when no rows exist
    • getMany deduplicates repeated content IDs without duplicate rows
  • .changeset/brave-seals-hydrate.md (patch)

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Performance improvement
  • Tests
  • Chore

Checklist

AI-generated code disclosure

  • This PR includes AI-generated code (drafted in-session while debugging a production incident, reviewed and tested locally)

Repro

wrangler d1 execute <db> --remote --command \
  "SELECT 1 WHERE 'x' = ? AND 1 IN (?,?,…×100)"
# → too many SQL variables at offset 231: SQLITE_ERROR [code: 7500]

Test output (SEO fix + PR #223 byline suite)

 ✓ tests/unit/utils/chunks.test.ts (7 tests) 2ms
 ✓ tests/unit/database/repositories/seo.test.ts (3 tests) 68ms
 ✓ tests/unit/database/repositories/byline.test.ts (8 tests) 141ms
 ✓ tests/unit/bylines/bylines-query.test.ts (11 tests) 197ms
 ✓ tests/unit/database/repositories/content.test.ts (47 tests) 665ms

 Test Files  5 passed (5)
      Tests  76 passed (76)

baezor added 4 commits April 4, 2026 01:26
Fixes emdash-cms#219. hydrateEntryBylines builds unbounded IN (?, ?, …) clauses
that exceed Cloudflare D1's bound-parameter limit on large collections.

Adds a chunks() utility and applies it defense-in-depth at the
repository level: getContentBylinesMany, findByUserIds, and
getAuthorIds now batch IDs in groups of 50.
Deduplicates contentIds in getContentBylinesMany to prevent duplicate
credits when the same ID appears across chunk boundaries. Adds tests
for the duplication edge case and an end-to-end getBylinesForEntries
test spanning both explicit and inferred byline paths.
Copilot AI review requested due to automatic review settings April 9, 2026 23:47
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 9, 2026

🦋 Changeset detected

Latest commit: d8a56cc

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages
Name Type
emdash Patch
@emdash-cms/cloudflare Patch
@emdash-cms/plugin-ai-moderation Patch
@emdash-cms/plugin-atproto Patch
@emdash-cms/plugin-audit-log Patch
@emdash-cms/plugin-color Patch
@emdash-cms/plugin-embeds Patch
@emdash-cms/plugin-forms Patch
@emdash-cms/plugin-webhook-notifier Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Cloudflare D1’s bound-parameter limit by chunking large IN (...) queries (notably SEO hydration via SeoRepository.getMany), building on the chunking utilities and byline batching introduced in #223.

Changes:

  • Added chunks() + SQL_BATCH_SIZE utility to cap IN (...) batch sizes.
  • Updated SeoRepository.getMany to dedupe IDs and query in chunks while returning defaults for missing rows.
  • Added/updated unit tests covering chunking behavior and large-batch SEO/byline hydration, plus patch changesets.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
packages/core/src/utils/chunks.ts Introduces chunking helper and batch-size constant for SQL IN (...) queries.
packages/core/src/database/repositories/seo.ts Chunks content_id IN (...) in getMany to avoid D1 SQL variable limit.
packages/core/src/database/repositories/byline.ts Chunks IN (...) queries for byline hydration (from #223).
packages/core/src/bylines/index.ts Chunks raw SQL id IN (...) author-id lookup (from #223).
packages/core/tests/unit/utils/chunks.test.ts Adds unit tests for chunk helper + SQL_BATCH_SIZE.
packages/core/tests/unit/database/repositories/seo.test.ts New tests for chunked SEO hydration behavior.
packages/core/tests/unit/database/repositories/byline.test.ts Adds tests for chunked byline repository methods (from #223).
packages/core/tests/unit/bylines/bylines-query.test.ts Adds higher-volume bylines query test (from #223).
.changeset/cute-eagles-rescue.md Changeset for byline/D1 batching fix (from #223).
.changeset/brave-seals-hydrate.md Changeset for SEO/D1 batching fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +7 to +12
export function chunks<T>(arr: T[], size: number): T[][] {
if (arr.length === 0) return [];
const result: T[][] = [];
for (let i = 0; i < arr.length; i += size) {
result.push(arr.slice(i, i + size));
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunks() can enter an infinite loop when called with size <= 0 because i += size will never advance (or will go backwards). Since this helper is exported, add input validation (e.g., throw for non-positive sizes, or clamp to 1) to prevent hangs at runtime.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is from #223 (the base branch this PR stacks on), not from this PR's diff. The concern is valid in the abstract but in practice the only call sites use the exported SQL_BATCH_SIZE constant (50), so size <= 0 can't happen at runtime. If the maintainers want a guard, it belongs in #223 alongside the function definition.

baezor and others added 2 commits April 9, 2026 18:51
SeoRepository.getMany builds a WHERE content_id IN (?, ?, ...)
clause alongside a collection = ? filter. On Cloudflare D1, which
caps bound parameters at 100 per query, passing 100 content ids
produces 101 parameters and trips the limit:

  D1_ERROR: too many SQL variables at offset 369: SQLITE_ERROR

This is the same root cause as the byline hydration fix in the
sibling commit, but on a different repository that wasn't covered
there. SeoRepository.getMany is called from handleContentList
before hydrateBylinesMany, so on any collection with has_seo = 1
and >= 99 items, it's the first function to fail on the admin
content list endpoint.

Apply the same chunking pattern using the shared chunks() helper
and SQL_BATCH_SIZE constant. Deduplicate contentIds before
chunking for consistency with the byline fix. Pre-fill result
with defaults so the two-pass resolve-then-fill-missing logic
collapses to a single pass.

Adds unit tests covering:
  - input size larger than SQL_BATCH_SIZE, real ids spread across chunks
  - all-missing ids get defaults
  - duplicate input ids resolve cleanly without duplicate rows

Repro of the underlying D1 limit for the record:

  wrangler d1 execute <db> --remote --command \
    "SELECT 1 WHERE 'x' = ? AND 1 IN (?,?,...x100)"
  -> too many SQL variables at offset 231: SQLITE_ERROR [code: 7500]
@github-actions
Copy link
Copy Markdown
Contributor

Overlapping PRs

This PR modifies files that are also changed by other open PRs:

This may cause merge conflicts or duplicated work. A maintainer will coordinate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants