fix(consolidation): skip eager first tick at startup to avoid FalkorDB load race#165
Merged
jack-arturo merged 1 commit intoverygoodplugins:mainfrom May 1, 2026
Conversation
…B load race When init_consolidation_scheduler() ran a tick immediately after spawning the worker thread, FalkorDB could still be loading its RDB snapshot from disk. Every Redis command in that window returns "LOADING Redis is loading the dataset in memory", so the eager tick fails — but the failure is caught and last_run timestamps get bumped, silently skipping the day's decay / creative / cluster work until tomorrow. The bigger the corpus, the longer the RDB load, the more reliably this fires. On any restart-on-deploy host (Railway, Docker, systemd) with a few thousand memories, it hits every deploy. Removing the eager tick is safe: - The worker loop still fires within CONSOLIDATION_TICK_SECONDS (default 1h). For decay/creative/cluster intervals measured in days, a one-tick delay at startup is invisible. - The scheduler is timestamp-driven (last_run per task), not edge-triggered. Nothing is "lost" by deferring — the next loop iteration picks up any missed intervals. - Failure mode flips from "silent broken run" to "no run yet, will run shortly" — strictly better. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
When
init_consolidation_scheduler()runs a tick immediately after spawning the worker thread, FalkorDB can still be loading its RDB snapshot from disk. Every Redis command during that window returns:The eager tick catches the error, logs it, and bumps
last_runtimestamps — silently skipping the day's decay / creative / cluster work until tomorrow. The bigger the corpus, the longer the RDB load, the more reliably this fires. On any restart-on-deploy host (Railway, Docker, systemd) with a few thousand memories, it hits every deploy.What changes
One line in
automem/consolidation/runtime_scheduler.py:100— drop the eagerrun_consolidation_tick_fn()call after starting the worker thread, and add a comment explaining why.Why this is safe
CONSOLIDATION_TICK_SECONDS(default 3600s = 1h). For decay/creative/cluster intervals measured in days, a one-tick startup delay is invisible.last_runper task), not edge-triggered. Missed intervals get picked up by the next loop iteration — nothing is "lost" by deferring.Out of scope
discover_creative_associations/ clustering improvements live in feat(consolidation): expose cluster threshold and min size as env vars #163 and feat(scripts): safer reclassify_with_llm.py with provider flags + tighter prompt #164.Test plan
CONSOLIDATION_TICK_SECONDSPOST /consolidatestill works immediatelyLOADING Redis is loading the dataset in memoryerrors appear in consolidation logs