Summary
KnowledgeHarvester.ts has the pieces of a queue-based curation workflow, but not the lifecycle semantics needed to make it safe for scheduled or repeated use.
Related issue #1171 correctly covers the primary bug: harvested candidates can land directly in canonical KNOWLEDGE/<domain>/. This issue is narrower: even when _harvest-queue is used, the queue consumer currently mutates on read and immediately promotes candidates through the normal harvest path.
Why this matters
KNOWLEDGE/ is the curated long-term memory surface. A periodic learning rhythm needs a safe proposed-vs-canonical boundary:
- collect candidates
- review candidates
- explicitly promote, reject, or edit them
- regenerate indexes
The current queue behavior collapses steps 2 and 3. If someone wires KnowledgeHarvester into a weekly cron, launchd job, or session-end hook, queue candidates can be silently promoted and deleted from the queue without review.
Evidence
In PAI/TOOLS/KnowledgeHarvester.ts:
HARVEST_QUEUE_DIR points to KNOWLEDGE/_harvest-queue.
scanHarvestQueue reads queue JSON files, converts them into candidates, then deletes the queue file with fs.unlinkSync.
- the normal
harvest command appends those queue candidates into allCandidates.
writeNote writes candidates directly to KNOWLEDGE/<domain>/<slug>.md.
So _harvest-queue currently behaves as a destructive ingest source, not a durable review queue.
There is an adjacent input-quality issue: scanResearch walks Markdown files primarily by length, so scaffolding files such as README.md can become canonical notes like Research/readme.md unless filename/frontmatter filters are added before queueing or promoting.
Suggested lifecycle
Add explicit verbs and make mutation opt-in:
harvest --enqueue or default harvest writes candidates to _harvest-queue.
review lists queued candidates without mutation.
promote <slug> writes one reviewed item into KNOWLEDGE/<domain>/.
reject <slug> archives or deletes one reviewed item.
promote-all --domain Ideas exists only as an explicit batch operation.
- default
harvest does not consume _harvest-queue unless passed an explicit flag such as --from-queue or --promote.
Acceptance criteria
- Running
harvest cannot silently promote or delete queued candidates.
- Running
review is read-only.
- Promotion preserves
harvested_from provenance.
- Rejection leaves either an audit trail or an explicit deletion record.
- Research scanning excludes obvious scaffolding files such as
README.md, _index.md, and generated dashboard files.
- Documentation describes the intended learning cadence: candidate capture vs curated knowledge promotion.
Related
This is intended as the operational half of #1171: once the queue exists, the review/promote lifecycle needs safe defaults before KnowledgeHarvester is suitable for automation.
Summary
KnowledgeHarvester.tshas the pieces of a queue-based curation workflow, but not the lifecycle semantics needed to make it safe for scheduled or repeated use.Related issue #1171 correctly covers the primary bug: harvested candidates can land directly in canonical
KNOWLEDGE/<domain>/. This issue is narrower: even when_harvest-queueis used, the queue consumer currently mutates on read and immediately promotes candidates through the normal harvest path.Why this matters
KNOWLEDGE/is the curated long-term memory surface. A periodic learning rhythm needs a safe proposed-vs-canonical boundary:The current queue behavior collapses steps 2 and 3. If someone wires KnowledgeHarvester into a weekly cron, launchd job, or session-end hook, queue candidates can be silently promoted and deleted from the queue without review.
Evidence
In
PAI/TOOLS/KnowledgeHarvester.ts:HARVEST_QUEUE_DIRpoints toKNOWLEDGE/_harvest-queue.scanHarvestQueuereads queue JSON files, converts them into candidates, then deletes the queue file withfs.unlinkSync.harvestcommand appends those queue candidates intoallCandidates.writeNotewrites candidates directly toKNOWLEDGE/<domain>/<slug>.md.So
_harvest-queuecurrently behaves as a destructive ingest source, not a durable review queue.There is an adjacent input-quality issue:
scanResearchwalks Markdown files primarily by length, so scaffolding files such asREADME.mdcan become canonical notes likeResearch/readme.mdunless filename/frontmatter filters are added before queueing or promoting.Suggested lifecycle
Add explicit verbs and make mutation opt-in:
harvest --enqueueor default harvest writes candidates to_harvest-queue.reviewlists queued candidates without mutation.promote <slug>writes one reviewed item intoKNOWLEDGE/<domain>/.reject <slug>archives or deletes one reviewed item.promote-all --domain Ideasexists only as an explicit batch operation.harvestdoes not consume_harvest-queueunless passed an explicit flag such as--from-queueor--promote.Acceptance criteria
harvestcannot silently promote or delete queued candidates.reviewis read-only.harvested_fromprovenance.README.md,_index.md, and generated dashboard files.Related
_harvest-queuecuration boundary.This is intended as the operational half of #1171: once the queue exists, the review/promote lifecycle needs safe defaults before KnowledgeHarvester is suitable for automation.