feat(scripts): safer reclassify_with_llm.py with provider flags + tighter prompt by flintfromthebasement · Pull Request #164 · verygoodplugins/automem

flintfromthebasement · 2026-05-01T15:32:00Z

Why

scripts/reclassify_with_llm.py is a one-shot maintenance tool, but in its current form it's all-or-nothing: there's no way to dry-run it, no way to sample a subset, and no way to point it at anything except OpenAI. That makes it scary to actually run on a real corpus, and impossible to benchmark alternative classification models without forking.

This PR makes it safe to use on a production corpus and provider-agnostic, plus tightens the classification prompt based on a 100-memory benchmark.

What changes

1. CLI flags for safe partial runs

Flag	Purpose
`--limit N`	Cap memories processed
`--sample N`	Random sample of N memories (instead of "first N")
`--seed N`	Reproducibility for sampled runs
`--dry-run`	Classify but don't write back to FalkorDB
`--yes`	Skip the interactive confirmation prompt
`--provider P`	`openai` or `openrouter` (default: `openai`)
`--model M`	Override `CLASSIFICATION_MODEL` per-run

Typical workflow now:

# 1. Sanity-check on 100 random memories, no writes
./scripts/reclassify_with_llm.py --sample 100 --seed 42 --dry-run

# 2. If the distribution looks right, commit to the full pass
./scripts/reclassify_with_llm.py --yes

2. OpenRouter / OpenAI-compatible provider support

Adds three env vars: OPENROUTER_API_KEY, CLASSIFICATION_BASE_URL, CLASSIFICATION_API_KEY. Same script can now target OpenRouter, LiteLLM, vLLM, Azure, or any OpenAI-compatible endpoint without code changes.

Includes a tolerant JSON extractor for models that don't honor response_format=json (Gemini families on OpenRouter return prose-wrapped JSON and otherwise crash the strict parser).

3. Tightened SYSTEM_PROMPT

The prior prompt was a loose 7-bullet type list. New prompt has strict definitions, keyword cues, and explicit priority rules ("Fact:" and descriptive statements go to Context, not Insight; chat/DM fragments aren't Decisions just because they contain "decided").

Empirical impact on a 100-memory sample (Gemini 3.1 Flash-Lite via OpenRouter):

Type	Before (loose)	After (strict)
Insight	56% (catch-all)	8%
False Decisions on DM/session fragments	several	0
Context, Pattern, Habit	underused	distribution closer to intent

Out of scope

The startup-tick guard from the same flint-branch commit (automem/consolidation/runtime_scheduler.py) is not in this PR — it's a separate concern (FalkorDB RDB-loading race at init) and will land as its own PR.
discover_creative_associations (the rule-based "dreaming" edge inference in consolidation.py) is unchanged here. There's an open thought to LLM-replace that with the same Gemini 3.1 Flash-Lite + tight-prompt pattern — happy to file a separate issue if it's interesting to benchmark.

Test plan

./scripts/reclassify_with_llm.py --help shows all new flags
--dry-run --sample 10 runs against a dev FalkorDB without writing
--provider openrouter --model google/gemini-3.1-flash-lite-preview --sample 10 --dry-run works end-to-end
Default behavior (no flags, OpenAI) is unchanged from prior script

…hter prompt Three improvements to scripts/reclassify_with_llm.py to make it safe to run on a real corpus and easy to retarget at different LLM providers. 1. CLI flags for safe partial runs: - --limit N cap the number of memories processed - --sample N random sample N memories (instead of first N) - --seed N reproducibility for sampled runs - --dry-run classify but don't write back to FalkorDB - --yes skip the interactive confirmation prompt - --provider P openai | openrouter (default: openai) - --model M override CLASSIFICATION_MODEL per-run Lets you do a 100-memory sanity-check pass before committing to a full reclassification across thousands of records. The prior version was all-or-nothing. 2. OpenRouter / OpenAI-compatible support: - Adds OPENROUTER_API_KEY, CLASSIFICATION_BASE_URL, CLASSIFICATION_API_KEY env vars so the same script can target any OpenAI-compatible endpoint (OpenRouter, LiteLLM, vLLM, Azure, etc.) without code changes. - Adds a tolerant JSON extractor for models that don't honor response_format=json (e.g. Gemini families on OpenRouter), which otherwise return prose-wrapped JSON and crash the strict parser. 3. Tightened SYSTEM_PROMPT: - Replaces the loose 7-bullet type list with strict definitions, keyword cues, and explicit priority rules ("Fact:" / descriptive statements go to Context, not Insight; chat/DM fragments aren't Decisions just because they contain the word "decided"). - Empirical impact on a 100-memory sample using Gemini 3.1 Flash-Lite: - Insight share: 56% → 8% (was being used as a catch-all) - False Decision calls on session/DM fragments eliminated - Pattern, Context, Habit usage closer to the intended distribution The script remains a one-shot maintenance tool — typically run after a model swap, prompt change, or large bulk import — not a recurring task. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…B load race (#165) ## Why When `init_consolidation_scheduler()` runs a tick **immediately** after spawning the worker thread, FalkorDB can still be loading its RDB snapshot from disk. Every Redis command during that window returns: > `LOADING Redis is loading the dataset in memory` The eager tick catches the error, logs it, and bumps `last_run` timestamps — silently skipping the day's decay / creative / cluster work until tomorrow. The bigger the corpus, the longer the RDB load, the more reliably this fires. On any restart-on-deploy host (Railway, Docker, systemd) with a few thousand memories, it hits every deploy. ## What changes One line in `automem/consolidation/runtime_scheduler.py:100` — drop the eager `run_consolidation_tick_fn()` call after starting the worker thread, and add a comment explaining why. ```diff state.consolidation_thread.start() - run_consolidation_tick_fn() + # Skip eager first tick: FalkorDB may still be loading its RDB snapshot at + # startup and the "Redis is loading the dataset in memory" error poisons + # the day's decay/creative run. The worker loop will fire its first tick + # after consolidation_tick_seconds, which is plenty of warm-up time. logger.info("Consolidation scheduler initialized") ``` ## Why this is safe - The worker loop still fires within `CONSOLIDATION_TICK_SECONDS` (default 3600s = 1h). For decay/creative/cluster intervals measured in days, a one-tick startup delay is invisible. - The scheduler is timestamp-driven (`last_run` per task), not edge-triggered. Missed intervals get picked up by the next loop iteration — nothing is "lost" by deferring. - Failure mode flips from "silent broken run" to "no run yet, will run shortly" — strictly better. ## Out of scope - A more involved fix would actively probe FalkorDB readiness with retries before the first tick. That's a bigger change and arguably belongs at the FalkorDB-client layer, not here. This PR is the minimal, low-risk fix. - The `discover_creative_associations` / clustering improvements live in #163 and #164. ## Test plan - [ ] Service starts cleanly with no eager tick log entry - [ ] Worker loop fires its first tick after `CONSOLIDATION_TICK_SECONDS` - [ ] Forcing a tick via `POST /consolidate` still works immediately - [ ] On a restart with a large RDB, no `LOADING Redis is loading the dataset in memory` errors appear in consolidation logs Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

flintfromthebasement mentioned this pull request May 1, 2026

fix(consolidation): skip eager first tick at startup to avoid FalkorDB load race #165

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scripts): safer reclassify_with_llm.py with provider flags + tighter prompt#164

feat(scripts): safer reclassify_with_llm.py with provider flags + tighter prompt#164
flintfromthebasement wants to merge 1 commit intoverygoodplugins:mainfrom
flintfromthebasement:feat/reclassify-script-improvements

flintfromthebasement commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

flintfromthebasement commented May 1, 2026

Why

What changes

1. CLI flags for safe partial runs

2. OpenRouter / OpenAI-compatible provider support

3. Tightened SYSTEM_PROMPT

Out of scope

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant