feat(smc): add Structured Memory Compression engine (Phase 1)#3
feat(smc): add Structured Memory Compression engine (Phase 1)#3pythondatascrape wants to merge 21 commits intomainfrom
Conversation
Covers three optimizations: periodic codebook re-injection (--window flag), system prompt token accounting in proxy, and enum default omission in SerializeTurn. Includes testing plan and edge cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds background, per-task ownership and dependencies, code examples, acceptance criteria, detailed testing matrix, regression checks, edge case table with rationale, and rollout notes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8 tasks with TDD steps: enum default parsing, SerializeTurn omission, Definition() markers, proxy token accounting fix, handler windowSize wiring, periodic re-injection gate, test updates, and integration verification. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Parse enum defaults from schema (first value is default) - Omit default enum values in SerializeTurn output - Mark defaults with * in Definition() for LLM decoding - Update tests for new compression behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ctxOrig and ctxComp now include len(req.System)/4, fixing the statusline context chart which showed 1:1 because the system prompt (largest payload component) was excluded from estimates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codebook definitions are now only injected on turn 0 and every windowSize turns, matching the proxy's compression window. This avoids redundant definitions in every prompt while ensuring the LLM can still decode compressed history after old turns are rolled into [CONTEXT_SUMMARY]. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The benchmark was using &testing.T{} to call newTestDeps, which would
panic instead of cleanly failing if setup errors occurred. Inline the
setup using the benchmark's own *testing.B for proper error reporting.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Window size 0 or 1 now returns messages unchanged in Compress(), matching the spec intent of "no optimization, inject every turn." Previously 0 summarized everything and 1 kept only the last message, dropping recent context. Also adds engram serve --window N to override the config file value at the CLI, forwarded through daemonize to the child process. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ession Previously --window defaulted to 0, making it impossible to distinguish "not set" from "explicitly disable compression." Now defaults to -1 (not set); --window 0 correctly sets windowSize=0 which the compressor treats as no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions Two bugs caused the statusline context column to show wrong values: 1. Savings percentage used K-rounded integers, so any sub-1K compressed value (e.g. 444 → 0K) displayed as 100% savings instead of ~78%. Fixed by computing percentage from raw values. 2. System prompt field was typed as string, but Claude Code sends it as a content-block array. Go silently unmarshaled to empty string, causing systemTokens=0 and wrong session fingerprint (SHA-256 of ""). Fixed with json.RawMessage + extractSystem() handling both formats. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Full roadmap design for Structured Matrix Compression based on McKinsey and IBM architecture documents. Four phases: core SMC engine, persistence/learning, edge/security, and federation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10-task TDD plan for core structured matrix compression engine. Covers category schema, k-parameter, matrix, decomposer, config integration, and proxy handler wiring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Introduces a Phase 1 Structured Matrix Compression (SMC) engine and wires it into the proxy/serve path, alongside codebook/token-accounting optimizations to reduce context overhead and improve reporting accuracy.
Changes:
- Added
internal/smcpackage (schema, k-controller, conversation matrix, rule-based decomposer, preference signal) with unit + e2e tests. - Integrated SMC into the proxy (optional compression path), improved system-prompt parsing/token accounting, and added SMC wiring via
engram serveflags/config. - Optimized codebook serialization by omitting default enum values and marking defaults in definitions; adjusted related tests and statusline percent computation.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
internal/smc/category.go |
Adds configurable category schema, defaults, and validation. |
internal/smc/category_test.go |
Tests schema defaults/validation. |
internal/smc/k.go |
Adds k-controller for global/per-category k. |
internal/smc/k_test.go |
Tests k-controller behaviors. |
internal/smc/matrix.go |
Implements conversation matrix and serialization to provider messages. |
internal/smc/matrix_test.go |
Tests matrix append/serialization/token counting. |
internal/smc/decompose.go |
Implements Phase 1 rule-based decomposer. |
internal/smc/decompose_test.go |
Tests rule-based decomposition and k impact. |
internal/smc/preference.go |
Adds preference/correction signal type. |
internal/smc/integration_test.go |
Adds end-to-end SMC tests (multi-turn + custom schema). |
internal/proxy/handler.go |
Adds SMC mode, system prompt extraction, and token accounting updates. |
internal/proxy/handler_test.go |
Adds tests for system prompt token accounting, array-form system parsing, and SMC path. |
internal/proxy/compressor.go |
Treats window sizes <2 as “no compression”. |
internal/proxy/proxy.go |
Adds server helper to enable SMC on the underlying handler. |
cmd/engram/serve.go |
Adds --window and --k, wires SMC config/enablement into proxy startup. |
internal/server/handler.go |
Adds windowSize and periodic codebook-def injection gating. |
internal/server/handler_test.go |
Updates constructors and adds window-based reinjection tests. |
internal/server/handler_bench_test.go |
Updates benchmark to new handler constructor signature. |
internal/config/config.go |
Adds SMC config under proxy config with defaults. |
internal/config/config_test.go |
Tests SMC config defaults and custom category loading. |
internal/context/codebook.go |
Adds enum defaults, omits default enum fields in serialization, marks defaults in definitions. |
internal/context/codebook_test.go |
Updates/extends tests for new default omission/definition behavior. |
internal/context/response_codebook_test.go |
Updates expectations to reflect default omission behavior. |
internal/context/history_test.go |
Updates history test expectation after default omission changes. |
internal/optimizer/format.go |
Uses shared percent calculation for context saved %. |
internal/optimizer/format_test.go |
Adds test ensuring percent uses raw context values. |
docs/superpowers/specs/* |
Adds design specs for SMC + codebook optimizations. |
docs/superpowers/plans/* |
Adds implementation plans for SMC Phase 1 + codebook optimizations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Decompose all completed exchanges (pairs) except the last pair. | ||
| alreadyDecomposed := matrix.Len() | ||
| pairs := len(messages) / 2 | ||
| currentTail := messages[pairs*2:] | ||
|
|
||
| for i := alreadyDecomposed; i < pairs; i++ { | ||
| userIdx := i * 2 | ||
| assistIdx := i*2 + 1 | ||
| if assistIdx >= len(messages) { | ||
| break | ||
| } | ||
| exchange := smc.Exchange{ | ||
| UserMessage: messageText(messages[userIdx]), | ||
| AssistantMessage: messageText(messages[assistIdx]), | ||
| TurnIndex: i, | ||
| } | ||
| row, err := h.smcDecomposer.Decompose(context.Background(), exchange, h.smcSchema) | ||
| if err != nil { | ||
| slog.Warn("smc: decomposition failed, keeping raw", "turn", i, "err", err) | ||
| continue | ||
| } | ||
| matrix.Append(*row) | ||
| } | ||
|
|
||
| // Build output: matrix history messages + current raw tail | ||
| matrixMsgs := matrix.Messages() | ||
| result := make([]AnthropicMessage, 0, len(matrixMsgs)+len(currentTail)+2) | ||
|
|
||
| for _, m := range matrixMsgs { | ||
| result = append(result, AnthropicMessage{Role: m.Role, Content: m.Content}) | ||
| } | ||
| if len(result) > 0 { | ||
| result = append(result, AnthropicMessage{Role: "assistant", Content: "[compressed history above]"}) | ||
| } | ||
|
|
||
| // Append the current raw tail | ||
| if pairs > 0 && alreadyDecomposed < pairs { | ||
| lastPairStart := (pairs - 1) * 2 | ||
| result = append(result, messages[lastPairStart:]...) | ||
| } else { |
There was a problem hiding this comment.
In smcCompress, the loop decomposes all complete user+assistant pairs (i < pairs), but later the function also appends the last pair raw (messages[lastPairStart:]). This duplicates the last completed exchange (once in the matrix, once raw) on the first call, and makes the “except the last pair” comment inaccurate. Consider decomposing only up to pairs-1 (or otherwise excluding the last complete pair) when you intend to keep that pair raw for recency.
| matrix := h.getOrCreateMatrix(sessionID) | ||
|
|
||
| // Decompose all completed exchanges (pairs) except the last pair. | ||
| alreadyDecomposed := matrix.Len() | ||
| pairs := len(messages) / 2 | ||
| currentTail := messages[pairs*2:] | ||
|
|
||
| for i := alreadyDecomposed; i < pairs; i++ { | ||
| userIdx := i * 2 | ||
| assistIdx := i*2 + 1 | ||
| if assistIdx >= len(messages) { | ||
| break | ||
| } | ||
| exchange := smc.Exchange{ | ||
| UserMessage: messageText(messages[userIdx]), | ||
| AssistantMessage: messageText(messages[assistIdx]), | ||
| TurnIndex: i, | ||
| } | ||
| row, err := h.smcDecomposer.Decompose(context.Background(), exchange, h.smcSchema) | ||
| if err != nil { | ||
| slog.Warn("smc: decomposition failed, keeping raw", "turn", i, "err", err) | ||
| continue | ||
| } | ||
| matrix.Append(*row) | ||
| } |
There was a problem hiding this comment.
smcCompress mutates the per-session ConversationMatrix (matrix.Append) without any synchronization. Since ServeHTTP can be called concurrently for the same session ID, this can race on the matrix’s underlying slice and corrupt history. Consider guarding per-session matrix mutation with a mutex (per matrix or held under smcMu) or otherwise ensuring requests for a session are serialized.
| AssistantMessage: messageText(messages[assistIdx]), | ||
| TurnIndex: i, | ||
| } | ||
| row, err := h.smcDecomposer.Decompose(context.Background(), exchange, h.smcSchema) |
There was a problem hiding this comment.
smcCompress calls Decompose with context.Background(), so request cancellation/timeouts from the inbound HTTP request won’t propagate. Passing r.Context() through (e.g., add a ctx parameter to smcCompress) will make future decomposers (LLM-based / slower) behave correctly under client disconnects.
| row, err := h.smcDecomposer.Decompose(context.Background(), exchange, h.smcSchema) | |
| row, err := h.smcDecomposer.Decompose(ctx, exchange, h.smcSchema) |
| // SMC fields — when smcEnabled is true, use matrix decomposition instead of | ||
| // windowed compression. | ||
| smcEnabled bool | ||
| smcSchema smc.CategorySchema | ||
| smcK smc.KController | ||
| smcDecomposer smc.Decomposer | ||
| smcMatrices map[string]*smc.ConversationMatrix // sessionID -> matrix | ||
| smcMu sync.Mutex |
There was a problem hiding this comment.
smcMatrices is an unbounded in-memory map keyed by sessionID with no eviction/cleanup path. In long-running proxy processes (or if session IDs are derived from fingerprints), this can grow without bound and increase memory usage over time. Consider adding a cleanup strategy (TTL/LRU) or tying lifecycle to the existing session stats/retention behavior.
| // EnableSMC activates structured matrix compression, replacing windowed compression. | ||
| func (h *Handler) EnableSMC(schema smc.CategorySchema, k smc.KController) { | ||
| h.smcEnabled = true | ||
| h.smcSchema = schema | ||
| h.smcK = k | ||
| h.smcDecomposer = smc.NewRuleDecomposer(k) | ||
| h.smcMatrices = make(map[string]*smc.ConversationMatrix) | ||
| } |
There was a problem hiding this comment.
EnableSMC mutates multiple Handler fields (smcEnabled/schema/k/decomposer/map) without any locking. If this is ever called after the proxy has started serving, it can race with ServeHTTP and/or drop existing matrices. Consider either (1) making EnableSMC safe for concurrent use (mutex/atomic + preserve existing matrices), or (2) documenting/enforcing that it must be called only during initialization before Start().
| System string `json:"system"` | ||
| Stream bool `json:"stream"` | ||
| Messages []AnthropicMessage `json:"messages"` | ||
| RawSystem json.RawMessage `json:"system"` |
There was a problem hiding this comment.
anthropicRequest.RawSystem is a json.RawMessage without omitempty. If the incoming request omits system, RawSystem will remain nil and re-marshaling will emit "system": null, changing the wire payload. Consider adding omitempty (and/or only re-marshaling when you actually modify the request) to preserve the original shape.
| RawSystem json.RawMessage `json:"system"` | |
| RawSystem json.RawMessage `json:"system,omitempty"` |
| if kOverride >= 0 { | ||
| smcCfg.K = kOverride | ||
| } | ||
| smcSchema := configToSMCSchema(smcCfg) |
There was a problem hiding this comment.
SMC schema values are converted from config and used without validation. Since CategorySchema.Validate exists, it would be safer to validate the schema (non-empty categories, no duplicates, no blank names) before enabling SMC and fail fast with a clear error if the config is invalid.
| smcSchema := configToSMCSchema(smcCfg) | |
| smcSchema := configToSMCSchema(smcCfg) | |
| if err := smcSchema.Validate(); err != nil { | |
| return fmt.Errorf("invalid proxy SMC schema: %w", err) | |
| } |
| func TestHandler_SMCCompression(t *testing.T) { | ||
| upstream := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
| w.Header().Set("Content-Type", "application/json") | ||
| w.WriteHeader(200) | ||
| json.NewEncoder(w).Encode(map[string]any{ | ||
| "content": []map[string]string{{"text": "ok", "type": "text"}}, | ||
| "role": "assistant", | ||
| }) | ||
| })) | ||
| defer upstream.Close() | ||
|
|
||
| schema := smc.DefaultSchema() | ||
| kc := smc.NewKController(0.5, schema) | ||
| h := NewHandler(10, t.TempDir(), upstream.URL) | ||
| h.EnableSMC(schema, kc) | ||
|
|
||
| messages := make([]AnthropicMessage, 20) | ||
| for i := range messages { | ||
| if i%2 == 0 { | ||
| messages[i] = AnthropicMessage{Role: "user", Content: fmt.Sprintf("Please update file%d.go to add logging", i)} | ||
| } else { | ||
| messages[i] = AnthropicMessage{Role: "assistant", Content: fmt.Sprintf("Updated file%d.go with slog calls", i)} | ||
| } | ||
| } | ||
|
|
||
| body, _ := json.Marshal(map[string]any{ | ||
| "messages": messages, | ||
| "system": "You are a helpful assistant.", | ||
| }) | ||
|
|
||
| req := httptest.NewRequest(http.MethodPost, "/v1/messages", strings.NewReader(string(body))) | ||
| req.Header.Set("Content-Type", "application/json") | ||
| req.Header.Set("x-api-key", "test-key") | ||
|
|
||
| rr := httptest.NewRecorder() | ||
| h.ServeHTTP(rr, req) | ||
|
|
||
| if rr.Code != 200 { | ||
| t.Fatalf("expected 200, got %d: %s", rr.Code, rr.Body.String()) | ||
| } | ||
| } |
There was a problem hiding this comment.
TestHandler_SMCCompression only asserts the handler returns 200, but doesn’t assert that SMC compression actually changed the outbound messages payload (e.g., presence of intent= matrix rows and absence of [CONTEXT_SUMMARY], or that the message count shrinks/grows as expected). Since the upstream server can capture the request body, consider asserting on the received JSON to ensure the SMC path is really exercised.
| // CompressionRatio returns the target compression ratio for a category. | ||
| // Formula: compression_ratio = 1 - (1 - minRatio) * k | ||
| // At k=0: returns 1.0 (maximum compression). | ||
| // At k=1: returns minRatio (minimum compression, maximum fidelity). | ||
| func (kc KController) CompressionRatio(category string, minRatio float64) float64 { | ||
| k := kc.EffectiveK(category) | ||
| return 1 - (1-minRatio)*k |
There was a problem hiding this comment.
KController.CompressionRatio’s formula and doc appear inverted relative to the rest of SMC: RuleDecomposer.compress treats higher k as less compression (keeps more text), but CompressionRatio currently decreases as k increases (k=0→1.0, k=1→minRatio) while the comments call k=0 “maximum compression”. Consider aligning this method with the compress() mapping (e.g., ratio = minRatio + (1-minRatio)*k) or renaming/re-documenting it so k has a consistent meaning across the package.
| // CompressionRatio returns the target compression ratio for a category. | |
| // Formula: compression_ratio = 1 - (1 - minRatio) * k | |
| // At k=0: returns 1.0 (maximum compression). | |
| // At k=1: returns minRatio (minimum compression, maximum fidelity). | |
| func (kc KController) CompressionRatio(category string, minRatio float64) float64 { | |
| k := kc.EffectiveK(category) | |
| return 1 - (1-minRatio)*k | |
| // CompressionRatio returns the target retained-text ratio for a category. | |
| // Formula: compression_ratio = minRatio + (1 - minRatio) * k | |
| // At k=0: returns minRatio (maximum compression). | |
| // At k=1: returns 1.0 (minimum compression, maximum fidelity). | |
| func (kc KController) CompressionRatio(category string, minRatio float64) float64 { | |
| k := kc.EffectiveK(category) | |
| return minRatio + (1-minRatio)*k |
| targetRatio := 0.1 + 0.9*k | ||
| targetLen := int(float64(len([]rune(text))) * targetRatio) | ||
| if targetLen < 1 { | ||
| targetLen = 1 | ||
| } | ||
|
|
||
| runes := []rune(text) | ||
| if len(runes) <= targetLen { | ||
| return text | ||
| } | ||
| return string(runes[:targetLen]) |
There was a problem hiding this comment.
RuleDecomposer.compress converts the input to []rune twice (len([]rune(text)) and then runes := []rune(text)), which doubles allocations for non-empty text. Consider converting once and reusing the slice for both length calculation and truncation.
Summary
internal/smc/package with core SMC engine: category schema, k-parameter controller, conversation matrix, rule-based decomposer, and preference signal typesconfig.yamland adds--kCLI flag toengram serveDetails
New types (
internal/smc/):CategorySchema— defines named content categories (e.g., facts, preferences, instructions) with per-category k-parameter overridesKController— manages compression ratio (k) globally and per-categoryConversationMatrix— stores decomposed conversation rows and serializes back toprovider.MessageformatRuleDecomposer— heuristic keyword-based decomposer that extracts category-relevant content from messagesPreferenceSignal— correction tracking type for future adaptive compressionIntegration:
proxy.Handler.EnableSMC()activates matrix-based compression; falls back to sliding-window when SMC is disabledSMCConfigadded toProxyConfigwith sensible defaults (k=0.5, default 3-category schema)16 files changed, 1060 insertions(+), 3 deletions(-)
Test plan
internal/smc/,internal/config/,internal/proxy/)go test -race)go vetclean🤖 Generated with Claude Code