Skip to content

Fix batch worker writing extraction.json to wrong product directory ( close #26 )#27

Open
kiannidev wants to merge 1 commit into
aglover1221:mainfrom
kiannidev:fix/batch-worker-product-dir
Open

Fix batch worker writing extraction.json to wrong product directory ( close #26 )#27
kiannidev wants to merge 1 commit into
aglover1221:mainfrom
kiannidev:fix/batch-worker-product-dir

Conversation

@kiannidev
Copy link
Copy Markdown

Summary

Fixes #26 — the Anthropic batch poll worker wrote extraction.json to {DATA_DIR}/{slug}/ instead of the nested product path (e.g. server/dell/poweredge/r770/).

Related Issue

Fixes #26

Change Type

  • Bug fix
  • Regression tests
  • New feature
  • UI change
  • Security fix
  • Documentation only

What Changed

  • Added productPathRel to ProductContext and exported helpers:
    • batchCustomId(ctx) — full nested path used as Anthropic Batch custom_id
    • resolveBatchProductDir(customId) — resolves write directory with data-root confinement
  • submitBatchForCategory() now sets custom_id to the full product path, not bare slug
  • Batch worker uses resolveBatchProductDir() instead of path.resolve(DATA_DIR, slug)
  • insertResult() stores the batch custom id so worker DB lookup matches Anthropic results
  • Run detail page links to /products/[slug] using basename when stored path is nested
  • writeExtractionJson() calls invalidateExtractionCache() after every write
  • Added tests/unit/extract-batch.test.ts (4 cases)

Files Changed

File Change
lib/pipeline/extract.ts productPathRel, batch helpers, custom_id fix, cache invalidation
worker/handlers/anthropic-batch.ts Resolve nested product dir from custom_id
app/api/pipeline/extract/route.ts Store batch custom id in extraction_results
scripts/extract-one.ts Same insert fix for CLI batch mode
app/pipeline/runs/[id]/page.tsx Product page links use basename of nested path
tests/unit/extract-batch.test.ts New regression tests

Validation

./node_modules/.bin/tsc --noEmit   # pass
npm run build                      # pass
npm test                           # run locally (4 new + existing tests)

Test Plan

  • Submit batch via API for r770 under server category
  • Run worker until batch completes
  • Confirm extraction.json written to data/sample/server/dell/poweredge/r770/
  • Confirm data/sample/r770/ is NOT created
  • Reload product page — new extraction visible immediately (cache invalidated)
  • Run detail page product link opens /products/r770

Use full productPathRel as Anthropic Batch custom_id so the poll worker
writes to the correct nested directory. Invalidate extraction cache on write.
Fixes aglover1221#26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] batch worker writes extraction.json to {DATA_DIR}/{slug}/ instead of nested product path

1 participant