Skip to content

KnowledgeHarvester queue needs explicit review/promote lifecycle before scheduled use #1351

@ryan-baum

Description

@ryan-baum

Summary

KnowledgeHarvester.ts has the pieces of a queue-based curation workflow, but not the lifecycle semantics needed to make it safe for scheduled or repeated use.

Related issue #1171 correctly covers the primary bug: harvested candidates can land directly in canonical KNOWLEDGE/<domain>/. This issue is narrower: even when _harvest-queue is used, the queue consumer currently mutates on read and immediately promotes candidates through the normal harvest path.

Why this matters

KNOWLEDGE/ is the curated long-term memory surface. A periodic learning rhythm needs a safe proposed-vs-canonical boundary:

  1. collect candidates
  2. review candidates
  3. explicitly promote, reject, or edit them
  4. regenerate indexes

The current queue behavior collapses steps 2 and 3. If someone wires KnowledgeHarvester into a weekly cron, launchd job, or session-end hook, queue candidates can be silently promoted and deleted from the queue without review.

Evidence

In PAI/TOOLS/KnowledgeHarvester.ts:

  • HARVEST_QUEUE_DIR points to KNOWLEDGE/_harvest-queue.
  • scanHarvestQueue reads queue JSON files, converts them into candidates, then deletes the queue file with fs.unlinkSync.
  • the normal harvest command appends those queue candidates into allCandidates.
  • writeNote writes candidates directly to KNOWLEDGE/<domain>/<slug>.md.

So _harvest-queue currently behaves as a destructive ingest source, not a durable review queue.

There is an adjacent input-quality issue: scanResearch walks Markdown files primarily by length, so scaffolding files such as README.md can become canonical notes like Research/readme.md unless filename/frontmatter filters are added before queueing or promoting.

Suggested lifecycle

Add explicit verbs and make mutation opt-in:

  • harvest --enqueue or default harvest writes candidates to _harvest-queue.
  • review lists queued candidates without mutation.
  • promote <slug> writes one reviewed item into KNOWLEDGE/<domain>/.
  • reject <slug> archives or deletes one reviewed item.
  • promote-all --domain Ideas exists only as an explicit batch operation.
  • default harvest does not consume _harvest-queue unless passed an explicit flag such as --from-queue or --promote.

Acceptance criteria

  • Running harvest cannot silently promote or delete queued candidates.
  • Running review is read-only.
  • Promotion preserves harvested_from provenance.
  • Rejection leaves either an audit trail or an explicit deletion record.
  • Research scanning excludes obvious scaffolding files such as README.md, _index.md, and generated dashboard files.
  • Documentation describes the intended learning cadence: candidate capture vs curated knowledge promotion.

Related

This is intended as the operational half of #1171: once the queue exists, the review/promote lifecycle needs safe defaults before KnowledgeHarvester is suitable for automation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions