perf(orchestrator): reduce memory usage and release heap to OS after tasks (#89)#90
Conversation
Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: Payel-git-ol#89
Issue Payel-git-ol#89: idle RSS is ~20-30MB but climbs to 400MB-1.1GB during a task and does not come back. Go's GC does not eagerly return freed heap to the OS after a burst of large allocations (whole repos read into memory, big LLM prompts/responses, build-command output). Add internal/memory with: - Configure(): optional soft memory limit via ORCHESTRATOR_MEMORY_LIMIT_MIB (honours an explicit GOMEMLIMIT if already set). - ReleaseToOS(): runtime.GC() + debug.FreeOSMemory(), logging heap before/after. Call Configure() on startup and ReleaseToOS() after every task completes in CreateTaskStream so RSS drops back toward the idle level between tasks.
…it-ol#89) Two unbounded in-memory accumulations during task execution: 1. executeToolCommand buffered the FULL stdout+stderr of build commands (nix develop, npm install, cargo build, ...) in a strings.Builder. Such output can reach hundreds of MB and is only used for error diagnostics. Replace with a bounded buffer that retains the last 64KB (the tail, where failures surface) and frees the discarded head. 2. readProjectFiles fallback and detectNewFiles read every changed/untracked file into memory, including node_modules, target/, dist/ and other build artifacts. Filter these via util.IsIgnoredPath and prune ignored directories from the walk with filepath.SkipDir. Add tests for the bounded buffer and ignored-path filtering.
CI note: the one failing test predates this PR and is unrelated to memory
Root cause of that test: For this PR specifically: Happy to fix |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
📊 Context and tokens usage:Claude Opus 4.8: (2 sub-sessions)
Total: (12.3K new + 193.7K cache writes + 5.0M cache reads) input tokens, 44.2K output tokens, $4.879320 cost 🤖 Models used:
📎 Log file uploaded as Gist (2105KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
🔄 Auto-restart triggered (iteration 1)Reason: CI failures detected Starting new session to address the issues. Auto-restart-until-mergeable mode is active. This run will stop after 5 restart iterations. |
…l-git-ol#89) The pre-existing TestContainsSourceCode failed because hasRealCode required >= 3 non-comment lines, rejecting legitimate short sources (a one-line express server, package main + func main). Rewrite hasRealCode to accept any file with at least one real code line while still rejecting empty, comment-only and stub-only files (TODO/FIXME, pass, ...). Add TestHasRealCode to lock in the behavior.
Working session summaryThe CI is now green. Here's a summary of what I did. Root causeThe failing CI check (
But FixIn I also added Verification
Note: I also added PR: #90 This summary was automatically extracted from the AI working session output. |
🔄 Auto-restart-until-mergeable Log (iteration 1)This log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
📊 Context and tokens usage:
Total: (3.4K new + 42.1K cache writes + 1.6M cache reads) input tokens, 12.9K output tokens, $1.409211 cost 🤖 Models used:
📎 Log file uploaded as Gist (3068KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
✅ Ready to mergeThis pull request is now ready to be merged:
Monitored by hive-mind with --auto-restart-until-mergeable flag |
This reverts commit 4ca517e.
306aec6
into
Payel-git-ol:optimization/memory-management
🧠 Memory optimization for the orchestrator
Fixes #89
Problem
The orchestrator sits at ~20–30 MB while idle, but RSS climbs to 400 MB – 1.1 GB during a task and does not return afterwards. The issue suspected leaking goroutines.
Root cause
The goroutines are actually fine — every
go funcis bounded by async.WaitGroup(assignManagersParallel,groupchat.runConcurrentRound) or a single drained channel (streamSender.loop). No goroutine leak was found.The real causes are bursty large allocations that Go's GC keeps resident:
executeToolCommandaccumulated the entire stdout+stderr of build commands (nix develop,npm install,cargo build, …) in astrings.Builder. That output can be hundreds of MB and is only used for error diagnostics.readProjectFiles(fallback) anddetectNewFilesread every changed/untracked file into a map — includingnode_modules/,target/,dist/, etc. For tool-mode (node/rust/…) projects that is hundreds of MB of dependency trees.Changes
internal/memory(new):Configure()— optional soft memory limit viaORCHESTRATOR_MEMORY_LIMIT_MIB(respects an explicitGOMEMLIMITif already set).ReleaseToOS()—runtime.GC()+debug.FreeOSMemory(), logging heap before/after.memory.Configure()at startup andmemory.ReleaseToOS()after every task finishes inCreateTaskStream, so RSS drops back toward idle between tasks.strings.Builderwith a bounded buffer that keeps only the last 64 KB (the tail, where failures appear) and frees the discarded head.node_modules,target,dist, …) viautil.IsIgnoredPathinreadProjectFiles(withfilepath.SkipDir) anddetectNewFiles.How to verify
[memory] освобождение памяти ОС: heap NNN МБ -> MM МБ (goroutines=…)and RSS returning toward the idle level.node_modules/…) are no longer pulled into memory.Tests
internal/memory:ReleaseToOSshrinks heap after a large allocation is dropped;Configureapplies / respects memory limits.internal/service/rules/worker: bounded buffer keeps the tail and frees the head;readProjectFilesskips ignored dirs.Note on CI
TestContainsSourceCodeininternal/service/rules/workerfails on the clean base branch too (verified by stashing all changes) — it expects 1–2 line snippets to count as "real code" whilehasRealCoderequires ≥3 non-stub lines (a guard added for issue #98). This is unrelated to memory and is intentionally left untouched to avoid regressing that guard.