Skip to content

feat(smc): add Structured Memory Compression engine (Phase 1)#3

Open
pythondatascrape wants to merge 21 commits intomainfrom
feature/smc-phase1
Open

feat(smc): add Structured Memory Compression engine (Phase 1)#3
pythondatascrape wants to merge 21 commits intomainfrom
feature/smc-phase1

Conversation

@pythondatascrape
Copy link
Copy Markdown
Owner

Summary

  • Adds internal/smc/ package with core SMC engine: category schema, k-parameter controller, conversation matrix, rule-based decomposer, and preference signal types
  • Integrates SMC compression into the proxy handler as an alternative to the existing sliding-window compressor, enabled via config
  • Wires SMC configuration through config.yaml and adds --k CLI flag to engram serve

Details

New types (internal/smc/):

  • CategorySchema — defines named content categories (e.g., facts, preferences, instructions) with per-category k-parameter overrides
  • KController — manages compression ratio (k) globally and per-category
  • ConversationMatrix — stores decomposed conversation rows and serializes back to provider.Message format
  • RuleDecomposer — heuristic keyword-based decomposer that extracts category-relevant content from messages
  • PreferenceSignal — correction tracking type for future adaptive compression

Integration:

  • proxy.Handler.EnableSMC() activates matrix-based compression; falls back to sliding-window when SMC is disabled
  • SMCConfig added to ProxyConfig with sensible defaults (k=0.5, default 3-category schema)
  • E2E integration test demonstrates ~57% token reduction on multi-turn conversations

16 files changed, 1060 insertions(+), 3 deletions(-)

Test plan

  • All unit tests pass (internal/smc/, internal/config/, internal/proxy/)
  • E2E integration test verifies multi-turn compression and custom schemas
  • Race detector clean (go test -race)
  • go vet clean
  • Binary builds successfully

🤖 Generated with Claude Code

Erik Meyer and others added 21 commits April 8, 2026 07:10
Covers three optimizations: periodic codebook re-injection (--window flag),
system prompt token accounting in proxy, and enum default omission in
SerializeTurn. Includes testing plan and edge cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds background, per-task ownership and dependencies, code examples,
acceptance criteria, detailed testing matrix, regression checks,
edge case table with rationale, and rollout notes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8 tasks with TDD steps: enum default parsing, SerializeTurn omission,
Definition() markers, proxy token accounting fix, handler windowSize
wiring, periodic re-injection gate, test updates, and integration verification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Parse enum defaults from schema (first value is default)
- Omit default enum values in SerializeTurn output
- Mark defaults with * in Definition() for LLM decoding
- Update tests for new compression behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ctxOrig and ctxComp now include len(req.System)/4, fixing the
statusline context chart which showed 1:1 because the system
prompt (largest payload component) was excluded from estimates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codebook definitions are now only injected on turn 0 and every
windowSize turns, matching the proxy's compression window. This avoids
redundant definitions in every prompt while ensuring the LLM can still
decode compressed history after old turns are rolled into
[CONTEXT_SUMMARY].

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The benchmark was using &testing.T{} to call newTestDeps, which would
panic instead of cleanly failing if setup errors occurred. Inline the
setup using the benchmark's own *testing.B for proper error reporting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Window size 0 or 1 now returns messages unchanged in Compress(), matching
the spec intent of "no optimization, inject every turn." Previously 0
summarized everything and 1 kept only the last message, dropping recent
context.

Also adds engram serve --window N to override the config file value at
the CLI, forwarded through daemonize to the child process.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ession

Previously --window defaulted to 0, making it impossible to distinguish
"not set" from "explicitly disable compression." Now defaults to -1
(not set); --window 0 correctly sets windowSize=0 which the compressor
treats as no-op.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions

Two bugs caused the statusline context column to show wrong values:

1. Savings percentage used K-rounded integers, so any sub-1K compressed
   value (e.g. 444 → 0K) displayed as 100% savings instead of ~78%.
   Fixed by computing percentage from raw values.

2. System prompt field was typed as string, but Claude Code sends it as
   a content-block array. Go silently unmarshaled to empty string,
   causing systemTokens=0 and wrong session fingerprint (SHA-256 of "").
   Fixed with json.RawMessage + extractSystem() handling both formats.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Full roadmap design for Structured Matrix Compression based on
McKinsey and IBM architecture documents. Four phases: core SMC
engine, persistence/learning, edge/security, and federation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10-task TDD plan for core structured matrix compression engine.
Covers category schema, k-parameter, matrix, decomposer, config
integration, and proxy handler wiring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 9, 2026 11:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a Phase 1 Structured Matrix Compression (SMC) engine and wires it into the proxy/serve path, alongside codebook/token-accounting optimizations to reduce context overhead and improve reporting accuracy.

Changes:

  • Added internal/smc package (schema, k-controller, conversation matrix, rule-based decomposer, preference signal) with unit + e2e tests.
  • Integrated SMC into the proxy (optional compression path), improved system-prompt parsing/token accounting, and added SMC wiring via engram serve flags/config.
  • Optimized codebook serialization by omitting default enum values and marking defaults in definitions; adjusted related tests and statusline percent computation.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
internal/smc/category.go Adds configurable category schema, defaults, and validation.
internal/smc/category_test.go Tests schema defaults/validation.
internal/smc/k.go Adds k-controller for global/per-category k.
internal/smc/k_test.go Tests k-controller behaviors.
internal/smc/matrix.go Implements conversation matrix and serialization to provider messages.
internal/smc/matrix_test.go Tests matrix append/serialization/token counting.
internal/smc/decompose.go Implements Phase 1 rule-based decomposer.
internal/smc/decompose_test.go Tests rule-based decomposition and k impact.
internal/smc/preference.go Adds preference/correction signal type.
internal/smc/integration_test.go Adds end-to-end SMC tests (multi-turn + custom schema).
internal/proxy/handler.go Adds SMC mode, system prompt extraction, and token accounting updates.
internal/proxy/handler_test.go Adds tests for system prompt token accounting, array-form system parsing, and SMC path.
internal/proxy/compressor.go Treats window sizes <2 as “no compression”.
internal/proxy/proxy.go Adds server helper to enable SMC on the underlying handler.
cmd/engram/serve.go Adds --window and --k, wires SMC config/enablement into proxy startup.
internal/server/handler.go Adds windowSize and periodic codebook-def injection gating.
internal/server/handler_test.go Updates constructors and adds window-based reinjection tests.
internal/server/handler_bench_test.go Updates benchmark to new handler constructor signature.
internal/config/config.go Adds SMC config under proxy config with defaults.
internal/config/config_test.go Tests SMC config defaults and custom category loading.
internal/context/codebook.go Adds enum defaults, omits default enum fields in serialization, marks defaults in definitions.
internal/context/codebook_test.go Updates/extends tests for new default omission/definition behavior.
internal/context/response_codebook_test.go Updates expectations to reflect default omission behavior.
internal/context/history_test.go Updates history test expectation after default omission changes.
internal/optimizer/format.go Uses shared percent calculation for context saved %.
internal/optimizer/format_test.go Adds test ensuring percent uses raw context values.
docs/superpowers/specs/* Adds design specs for SMC + codebook optimizations.
docs/superpowers/plans/* Adds implementation plans for SMC Phase 1 + codebook optimizations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/proxy/handler.go
Comment on lines +307 to +346
// Decompose all completed exchanges (pairs) except the last pair.
alreadyDecomposed := matrix.Len()
pairs := len(messages) / 2
currentTail := messages[pairs*2:]

for i := alreadyDecomposed; i < pairs; i++ {
userIdx := i * 2
assistIdx := i*2 + 1
if assistIdx >= len(messages) {
break
}
exchange := smc.Exchange{
UserMessage: messageText(messages[userIdx]),
AssistantMessage: messageText(messages[assistIdx]),
TurnIndex: i,
}
row, err := h.smcDecomposer.Decompose(context.Background(), exchange, h.smcSchema)
if err != nil {
slog.Warn("smc: decomposition failed, keeping raw", "turn", i, "err", err)
continue
}
matrix.Append(*row)
}

// Build output: matrix history messages + current raw tail
matrixMsgs := matrix.Messages()
result := make([]AnthropicMessage, 0, len(matrixMsgs)+len(currentTail)+2)

for _, m := range matrixMsgs {
result = append(result, AnthropicMessage{Role: m.Role, Content: m.Content})
}
if len(result) > 0 {
result = append(result, AnthropicMessage{Role: "assistant", Content: "[compressed history above]"})
}

// Append the current raw tail
if pairs > 0 && alreadyDecomposed < pairs {
lastPairStart := (pairs - 1) * 2
result = append(result, messages[lastPairStart:]...)
} else {
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In smcCompress, the loop decomposes all complete user+assistant pairs (i < pairs), but later the function also appends the last pair raw (messages[lastPairStart:]). This duplicates the last completed exchange (once in the matrix, once raw) on the first call, and makes the “except the last pair” comment inaccurate. Consider decomposing only up to pairs-1 (or otherwise excluding the last complete pair) when you intend to keep that pair raw for recency.

Copilot uses AI. Check for mistakes.
Comment thread internal/proxy/handler.go
Comment on lines +305 to +329
matrix := h.getOrCreateMatrix(sessionID)

// Decompose all completed exchanges (pairs) except the last pair.
alreadyDecomposed := matrix.Len()
pairs := len(messages) / 2
currentTail := messages[pairs*2:]

for i := alreadyDecomposed; i < pairs; i++ {
userIdx := i * 2
assistIdx := i*2 + 1
if assistIdx >= len(messages) {
break
}
exchange := smc.Exchange{
UserMessage: messageText(messages[userIdx]),
AssistantMessage: messageText(messages[assistIdx]),
TurnIndex: i,
}
row, err := h.smcDecomposer.Decompose(context.Background(), exchange, h.smcSchema)
if err != nil {
slog.Warn("smc: decomposition failed, keeping raw", "turn", i, "err", err)
continue
}
matrix.Append(*row)
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smcCompress mutates the per-session ConversationMatrix (matrix.Append) without any synchronization. Since ServeHTTP can be called concurrently for the same session ID, this can race on the matrix’s underlying slice and corrupt history. Consider guarding per-session matrix mutation with a mutex (per matrix or held under smcMu) or otherwise ensuring requests for a session are serialized.

Copilot uses AI. Check for mistakes.
Comment thread internal/proxy/handler.go
AssistantMessage: messageText(messages[assistIdx]),
TurnIndex: i,
}
row, err := h.smcDecomposer.Decompose(context.Background(), exchange, h.smcSchema)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smcCompress calls Decompose with context.Background(), so request cancellation/timeouts from the inbound HTTP request won’t propagate. Passing r.Context() through (e.g., add a ctx parameter to smcCompress) will make future decomposers (LLM-based / slower) behave correctly under client disconnects.

Suggested change
row, err := h.smcDecomposer.Decompose(context.Background(), exchange, h.smcSchema)
row, err := h.smcDecomposer.Decompose(ctx, exchange, h.smcSchema)

Copilot uses AI. Check for mistakes.
Comment thread internal/proxy/handler.go
Comment on lines +45 to +52
// SMC fields — when smcEnabled is true, use matrix decomposition instead of
// windowed compression.
smcEnabled bool
smcSchema smc.CategorySchema
smcK smc.KController
smcDecomposer smc.Decomposer
smcMatrices map[string]*smc.ConversationMatrix // sessionID -> matrix
smcMu sync.Mutex
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smcMatrices is an unbounded in-memory map keyed by sessionID with no eviction/cleanup path. In long-running proxy processes (or if session IDs are derived from fingerprints), this can grow without bound and increase memory usage over time. Consider adding a cleanup strategy (TTL/LRU) or tying lifecycle to the existing session stats/retention behavior.

Copilot uses AI. Check for mistakes.
Comment thread internal/proxy/handler.go
Comment on lines +277 to +284
// EnableSMC activates structured matrix compression, replacing windowed compression.
func (h *Handler) EnableSMC(schema smc.CategorySchema, k smc.KController) {
h.smcEnabled = true
h.smcSchema = schema
h.smcK = k
h.smcDecomposer = smc.NewRuleDecomposer(k)
h.smcMatrices = make(map[string]*smc.ConversationMatrix)
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnableSMC mutates multiple Handler fields (smcEnabled/schema/k/decomposer/map) without any locking. If this is ever called after the proxy has started serving, it can race with ServeHTTP and/or drop existing matrices. Consider either (1) making EnableSMC safe for concurrent use (mutex/atomic + preserve existing matrices), or (2) documenting/enforcing that it must be called only during initialization before Start().

Copilot uses AI. Check for mistakes.
Comment thread internal/proxy/handler.go
System string `json:"system"`
Stream bool `json:"stream"`
Messages []AnthropicMessage `json:"messages"`
RawSystem json.RawMessage `json:"system"`
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anthropicRequest.RawSystem is a json.RawMessage without omitempty. If the incoming request omits system, RawSystem will remain nil and re-marshaling will emit "system": null, changing the wire payload. Consider adding omitempty (and/or only re-marshaling when you actually modify the request) to preserve the original shape.

Suggested change
RawSystem json.RawMessage `json:"system"`
RawSystem json.RawMessage `json:"system,omitempty"`

Copilot uses AI. Check for mistakes.
Comment thread cmd/engram/serve.go
if kOverride >= 0 {
smcCfg.K = kOverride
}
smcSchema := configToSMCSchema(smcCfg)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMC schema values are converted from config and used without validation. Since CategorySchema.Validate exists, it would be safer to validate the schema (non-empty categories, no duplicates, no blank names) before enabling SMC and fail fast with a clear error if the config is invalid.

Suggested change
smcSchema := configToSMCSchema(smcCfg)
smcSchema := configToSMCSchema(smcCfg)
if err := smcSchema.Validate(); err != nil {
return fmt.Errorf("invalid proxy SMC schema: %w", err)
}

Copilot uses AI. Check for mistakes.
Comment on lines +444 to +484
func TestHandler_SMCCompression(t *testing.T) {
upstream := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(200)
json.NewEncoder(w).Encode(map[string]any{
"content": []map[string]string{{"text": "ok", "type": "text"}},
"role": "assistant",
})
}))
defer upstream.Close()

schema := smc.DefaultSchema()
kc := smc.NewKController(0.5, schema)
h := NewHandler(10, t.TempDir(), upstream.URL)
h.EnableSMC(schema, kc)

messages := make([]AnthropicMessage, 20)
for i := range messages {
if i%2 == 0 {
messages[i] = AnthropicMessage{Role: "user", Content: fmt.Sprintf("Please update file%d.go to add logging", i)}
} else {
messages[i] = AnthropicMessage{Role: "assistant", Content: fmt.Sprintf("Updated file%d.go with slog calls", i)}
}
}

body, _ := json.Marshal(map[string]any{
"messages": messages,
"system": "You are a helpful assistant.",
})

req := httptest.NewRequest(http.MethodPost, "/v1/messages", strings.NewReader(string(body)))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("x-api-key", "test-key")

rr := httptest.NewRecorder()
h.ServeHTTP(rr, req)

if rr.Code != 200 {
t.Fatalf("expected 200, got %d: %s", rr.Code, rr.Body.String())
}
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestHandler_SMCCompression only asserts the handler returns 200, but doesn’t assert that SMC compression actually changed the outbound messages payload (e.g., presence of intent= matrix rows and absence of [CONTEXT_SUMMARY], or that the message count shrinks/grows as expected). Since the upstream server can capture the request body, consider asserting on the received JSON to ensure the SMC path is really exercised.

Copilot uses AI. Check for mistakes.
Comment thread internal/smc/k.go
Comment on lines +34 to +40
// CompressionRatio returns the target compression ratio for a category.
// Formula: compression_ratio = 1 - (1 - minRatio) * k
// At k=0: returns 1.0 (maximum compression).
// At k=1: returns minRatio (minimum compression, maximum fidelity).
func (kc KController) CompressionRatio(category string, minRatio float64) float64 {
k := kc.EffectiveK(category)
return 1 - (1-minRatio)*k
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KController.CompressionRatio’s formula and doc appear inverted relative to the rest of SMC: RuleDecomposer.compress treats higher k as less compression (keeps more text), but CompressionRatio currently decreases as k increases (k=0→1.0, k=1→minRatio) while the comments call k=0 “maximum compression”. Consider aligning this method with the compress() mapping (e.g., ratio = minRatio + (1-minRatio)*k) or renaming/re-documenting it so k has a consistent meaning across the package.

Suggested change
// CompressionRatio returns the target compression ratio for a category.
// Formula: compression_ratio = 1 - (1 - minRatio) * k
// At k=0: returns 1.0 (maximum compression).
// At k=1: returns minRatio (minimum compression, maximum fidelity).
func (kc KController) CompressionRatio(category string, minRatio float64) float64 {
k := kc.EffectiveK(category)
return 1 - (1-minRatio)*k
// CompressionRatio returns the target retained-text ratio for a category.
// Formula: compression_ratio = minRatio + (1 - minRatio) * k
// At k=0: returns minRatio (maximum compression).
// At k=1: returns 1.0 (minimum compression, maximum fidelity).
func (kc KController) CompressionRatio(category string, minRatio float64) float64 {
k := kc.EffectiveK(category)
return minRatio + (1-minRatio)*k

Copilot uses AI. Check for mistakes.
Comment thread internal/smc/decompose.go
Comment on lines +89 to +99
targetRatio := 0.1 + 0.9*k
targetLen := int(float64(len([]rune(text))) * targetRatio)
if targetLen < 1 {
targetLen = 1
}

runes := []rune(text)
if len(runes) <= targetLen {
return text
}
return string(runes[:targetLen])
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuleDecomposer.compress converts the input to []rune twice (len([]rune(text)) and then runes := []rune(text)), which doubles allocations for non-empty text. Consider converting once and reusing the slice for both length calculation and truncation.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants