perf(orchestrator): reduce memory usage and release heap to OS after tasks (#89) by konard · Pull Request #90 · Payel-git-ol/Octra

konard · 2026-06-16T18:05:51Z

🧠 Memory optimization for the orchestrator

Fixes #89

Problem

The orchestrator sits at ~20–30 MB while idle, but RSS climbs to 400 MB – 1.1 GB during a task and does not return afterwards. The issue suspected leaking goroutines.

Root cause

The goroutines are actually fine — every go func is bounded by a sync.WaitGroup (assignManagersParallel, groupchat.runConcurrentRound) or a single drained channel (streamSender.loop). No goroutine leak was found.

The real causes are bursty large allocations that Go's GC keeps resident:

Heap not returned to the OS. After a task allocates a lot (whole repos read into memory, large LLM prompts/responses, build-command output), the freed heap is retained by the runtime, so RSS stays near its peak between tasks.
Unbounded command-output buffer. executeToolCommand accumulated the entire stdout+stderr of build commands (nix develop, npm install, cargo build, …) in a strings.Builder. That output can be hundreds of MB and is only used for error diagnostics.
Reading build artifacts into memory. readProjectFiles (fallback) and detectNewFiles read every changed/untracked file into a map — including node_modules/, target/, dist/, etc. For tool-mode (node/rust/…) projects that is hundreds of MB of dependency trees.

Changes

internal/memory (new):
- Configure() — optional soft memory limit via ORCHESTRATOR_MEMORY_LIMIT_MIB (respects an explicit GOMEMLIMIT if already set).
- ReleaseToOS() — runtime.GC() + debug.FreeOSMemory(), logging heap before/after.
Call memory.Configure() at startup and memory.ReleaseToOS() after every task finishes in CreateTaskStream, so RSS drops back toward idle between tasks.
Replace the unbounded build-output strings.Builder with a bounded buffer that keeps only the last 64 KB (the tail, where failures appear) and frees the discarded head.
Skip ignored/heavy directories (node_modules, target, dist, …) via util.IsIgnoredPath in readProjectFiles (with filepath.SkipDir) and detectNewFiles.

How to verify

Run a task and watch the orchestrator logs — after each task you'll see:
[memory] освобождение памяти ОС: heap NNN МБ -> MM МБ (goroutines=…) and RSS returning toward the idle level.
Run a node/rust task: build artifacts (node_modules/…) are no longer pulled into memory.

Tests

internal/memory: ReleaseToOS shrinks heap after a large allocation is dropped; Configure applies / respects memory limits.
internal/service/rules/worker: bounded buffer keeps the tail and frees the head; readProjectFiles skips ignored dirs.

go build ./...   # ok
go vet ./...     # ok
go test ./...    # all green except a PRE-EXISTING, unrelated failure (see below)

Note on CI

TestContainsSourceCode in internal/service/rules/worker fails on the clean base branch too (verified by stashing all changes) — it expects 1–2 line snippets to count as "real code" while hasRealCode requires ≥3 non-stub lines (a guard added for issue #98). This is unrelated to memory and is intentionally left untouched to avoid regressing that guard.

Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: Payel-git-ol#89

Issue Payel-git-ol#89: idle RSS is ~20-30MB but climbs to 400MB-1.1GB during a task and does not come back. Go's GC does not eagerly return freed heap to the OS after a burst of large allocations (whole repos read into memory, big LLM prompts/responses, build-command output). Add internal/memory with: - Configure(): optional soft memory limit via ORCHESTRATOR_MEMORY_LIMIT_MIB (honours an explicit GOMEMLIMIT if already set). - ReleaseToOS(): runtime.GC() + debug.FreeOSMemory(), logging heap before/after. Call Configure() on startup and ReleaseToOS() after every task completes in CreateTaskStream so RSS drops back toward the idle level between tasks.

…it-ol#89) Two unbounded in-memory accumulations during task execution: 1. executeToolCommand buffered the FULL stdout+stderr of build commands (nix develop, npm install, cargo build, ...) in a strings.Builder. Such output can reach hundreds of MB and is only used for error diagnostics. Replace with a bounded buffer that retains the last 64KB (the tail, where failures surface) and frees the discarded head. 2. readProjectFiles fallback and detectNewFiles read every changed/untracked file into memory, including node_modules, target/, dist/ and other build artifacts. Filter these via util.IsIgnoredPath and prune ignored directories from the walk with filepath.SkipDir. Add tests for the bounded buffer and ignored-path filtering.

konard · 2026-06-16T18:16:57Z

CI note: the one failing test predates this PR and is unrelated to memory

Orchestrator CI is red only on TestContainsSourceCode (subtests has_real_express_source, go_source). I verified this fails on upstream master (8109fbb) as well — the worker files are byte-identical between master and this branch's base, so it is not introduced here.

Root cause of that test: hasRealCode requires ≥3 non-stub lines, but the test asserts that 1–2 line snippets (e.g. package main\nfunc main() {}) count as real source. Reconciling the two changes the AI-fallback heuristic tied to issues #75/#98 (the ≥3 guard exists to reject TODO-stub-only files), so it's a separate product decision I've intentionally left untouched.

For this PR specifically: go build ./..., go vet ./..., and every other test pass, including the new internal/memory tests and the new bounded-buffer / ignored-path tests in internal/service/rules/worker.

Happy to fix hasRealCode/TestContainsSourceCode in a separate PR if you'd like.

konard · 2026-06-16T18:18:51Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

Public pricing estimate: $4.879320
Calculated by Anthropic: $5.597456
Difference: $0.718136 (+14.72%)

📊 Context and tokens usage:

Claude Opus 4.8: (2 sub-sessions)

115.5K / 1M (12%) input tokens, 36.7K / 128K (29%) output tokens
35.9K / 1M (4%) input tokens, 1.9K / 128K (1%) output tokens

Total: (12.3K new + 193.7K cache writes + 5.0M cache reads) input tokens, 44.2K output tokens, $4.879320 cost

🤖 Models used:

Tool: Anthropic Claude Code
Requested: opus
Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (2105KB)

View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2026-06-16T18:21:15Z

🔄 Auto-restart triggered (iteration 1)

Reason: CI failures detected

Starting new session to address the issues.

Auto-restart-until-mergeable mode is active. This run will stop after 5 restart iterations.

…l-git-ol#89) The pre-existing TestContainsSourceCode failed because hasRealCode required >= 3 non-comment lines, rejecting legitimate short sources (a one-line express server, package main + func main). Rewrite hasRealCode to accept any file with at least one real code line while still rejecting empty, comment-only and stub-only files (TODO/FIXME, pass, ...). Add TestHasRealCode to lock in the behavior.

konard · 2026-06-16T18:28:32Z

Working session summary

The CI is now green. Here's a summary of what I did.

Root cause

The failing CI check (Build & test (orchestrator)) was caused by the pre-existing test TestContainsSourceCode failing — not by the memory-optimization changes in the PR. The test asserts that short, legitimate source files count as "real code":

index.js: const express = require('express'); express().listen(3000) → expected true
main.go: package main\nfunc main() {} → expected true

But hasRealCode required ≥ 3 non-comment lines, so these 1–2 line files returned false. I confirmed this same test also fails on origin/master, so it was a latent contradiction in the codebase that surfaced when CI ran on this branch.

Fix

In orchestrator/internal/service/rules/worker/tool_executor.go, I rewrote hasRealCode to accept any file containing at least one real code line, while still rejecting empty, comment-only, and stub-only files (TODO/FIXME comments, pass, ..., bare braces) — preserving the issue #98 stub-rejection intent without the overly aggressive 3-line threshold. The comment-detection logic was extracted into a small isStubLine helper.

I also added TestHasRealCode (in live_code_files_test.go) to lock in both behaviors: short real code passes; empty/comment/stub-only content is rejected.

Verification

go build ./..., go vet ./..., go test ./... all pass locally (exit 0, no failures).
Both CI checks now pass on the latest commit (f2ec0e9):
- Build & test (orchestrator): pass
- Build all Go modules (Go Workspace CI): pass
Branch contains all of origin/master; working tree is clean.

Note: I also added ci-logs/ to .gitignore (and fixed a missing trailing newline in that file) so downloaded CI logs aren't committed.

PR: #90

This summary was automatically extracted from the AI working session output.

konard · 2026-06-16T18:28:40Z

🔄 Auto-restart-until-mergeable Log (iteration 1)

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

Public pricing estimate: $1.409211
Calculated by Anthropic: $7.164463
Difference: $5.755252 (+408.40%)

📊 Context and tokens usage:

54.5K / 1M (5%) input tokens, 12.9K / 128K (10%) output tokens

Total: (3.4K new + 42.1K cache writes + 1.6M cache reads) input tokens, 12.9K output tokens, $1.409211 cost

🤖 Models used:

Tool: Anthropic Claude Code
Requested: opus
Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (3068KB)

View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2026-06-16T18:31:07Z

✅ Ready to merge

This pull request is now ready to be merged:

All CI checks have passed
No merge conflicts
No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

This reverts commit 4ca517e.

konard added 3 commits June 16, 2026 18:05

Initial commit with task details

4ca517e

Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: Payel-git-ol#89

konard changed the title ~~[WIP] optimization/memory-management~~ perf(orchestrator): reduce memory usage and release heap to OS after tasks (#89) Jun 16, 2026

konard marked this pull request as ready for review June 16, 2026 18:15

Revert "Initial commit with task details"

26ab469

This reverts commit 4ca517e.

Payel-git-ol merged commit 306aec6 into Payel-git-ol:optimization/memory-management Jun 16, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(orchestrator): reduce memory usage and release heap to OS after tasks (#89)#90

perf(orchestrator): reduce memory usage and release heap to OS after tasks (#89)#90
Payel-git-ol merged 5 commits into
Payel-git-ol:optimization/memory-managementfrom
konard:issue-89-1b8d97369376

konard commented Jun 16, 2026 •

edited

Loading

Uh oh!

konard commented Jun 16, 2026

Uh oh!

konard commented Jun 16, 2026

Uh oh!

konard commented Jun 16, 2026

Uh oh!

konard commented Jun 16, 2026

Uh oh!

konard commented Jun 16, 2026

Uh oh!

konard commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

konard commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧠 Memory optimization for the orchestrator

Problem

Root cause

Changes

How to verify

Tests

Note on CI

Uh oh!

konard commented Jun 16, 2026

CI note: the one failing test predates this PR and is unrelated to memory

Uh oh!

konard commented Jun 16, 2026

🤖 Solution Draft Log

💰 Cost estimation:

📊 Context and tokens usage:

🤖 Models used:

📎 Log file uploaded as Gist (2105KB)

Uh oh!

konard commented Jun 16, 2026

🔄 Auto-restart triggered (iteration 1)

Uh oh!

konard commented Jun 16, 2026

Working session summary

Root cause

Fix

Verification

Uh oh!

konard commented Jun 16, 2026

🔄 Auto-restart-until-mergeable Log (iteration 1)

💰 Cost estimation:

📊 Context and tokens usage:

🤖 Models used:

📎 Log file uploaded as Gist (3068KB)

Uh oh!

konard commented Jun 16, 2026

✅ Ready to merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

konard commented Jun 16, 2026 •

edited

Loading