Skip to content
This repository was archived by the owner on Apr 29, 2026. It is now read-only.
This repository was archived by the owner on Apr 29, 2026. It is now read-only.

Data Store Interface for Summary Caching #11

@IBJunior

Description

@IBJunior

Problem Statement

What problem does this feature solve?

The current SummarizeCompressor implementation has a significant performance and cost issue. Every time compression is triggered, it re-summarizes the entire message history from scratch, even if portions of that history have been previously summarized.

Specific issues:

  • Performance: Lines 97-111 in src/strategies/summarize.ts always process the complete message range, leading to redundant AI model calls
  • Cost: Repeated summarization of the same content increases API costs unnecessarily
  • Inefficiency: The original goal of summarization (reduce hallucination and save costs) is undermined by this approach

Current behavior:

// Every compression call processes ALL messages in range
const messagesToSummarize = messages.slice(summarizeStart, keepTailStart);
const conversationText = messagesToSummarize
  .map((msg) => `${msg.role}: ${msg.content}`)
  .join('\n---\n');

Proposed Solution

High-level approach to solving the problem

Introduce a data store interface that allows caching of previously generated summaries, following the library's "Bring Your Own Model" (BYOM) pattern with "Bring Your Own Store" (BYOS).

Key components:

  1. SlimContextStore interface for storage abstraction
  2. Enhanced message identification system with thread/conversation IDs
  3. Intelligent cache key strategy using conversation context + message ranges
  4. Modified SummarizeCompressor to check cache before generating new summaries
  5. Optional InMemoryStore implementation for testing and learning purposes

Technical Details

Implementation considerations

New interfaces needed:

interface SlimContextStore {
  get(key: string): Promise<string | null>;
  set(key: string, value: string): Promise<void>;
  delete(key: string): Promise<void>;
}

// Conversation wrapper to avoid repetitive threadId on each message
interface SlimContextConversation {
  threadId: string;
  messages: SlimContextMessage[];
  metadata?: Record<string, unknown>;
}

// Keep SlimContextMessage clean and focused (no repetitive threadId)
interface SlimContextMessage {
  role: 'system' | 'user' | 'assistant' | 'tool' | 'human';
  content: string;
  metadata?: Record<string, unknown>;
  id?: string;        // Optional message identifier
  index?: number;     // Position within conversation
}

interface CacheKey {
  threadId: string;
  type: 'summary' | 'message';
  startIndex: number;
  endIndex?: number;     // For range summaries
}

Cache key strategy:

  • Format: "thread_{threadId}:summary:{startIndex}-{endIndex}"
  • Example: "thread_123:summary:5-15" (summary of messages 5-15 in thread 123)
  • Avoids using message content as keys (inefficient for long messages)

Backward Compatibility Strategy:

// Enhanced compressor interface with method overloading
interface SlimContextCompressor {
  // Existing method - maintains full backward compatibility
  compress(messages: SlimContextMessage[]): Promise<SlimContextMessage[]>;

  // New method - accepts conversation wrapper for enhanced functionality
  compress(conversation: SlimContextConversation): Promise<SlimContextConversation>;
}

// Utility functions for format conversion
function wrapMessages(messages: SlimContextMessage[], threadId: string): SlimContextConversation;
function unwrapMessages(conversation: SlimContextConversation): SlimContextMessage[];

Integration points:

  • Modify SummarizeCompressor.compress() to check cache before summarizing
  • Add method overloading to support both message arrays and conversation wrappers
  • Add store configuration to SummarizeConfig
  • Update token estimation to account for cached summaries
  • Cache key generation uses conversation.threadId instead of per-message repetition

Implementation Considerations

Open questions and design decisions

1. Summary Combination Strategy
When extending a cached summary (e.g., have summary 5-15, need summary 5-25):

Option A: AI-Driven Combination

  • Send model: existing summary (5-15) + new messages (16-25) combined summary (5-25)
  • Pros: Intelligent merging, better context preservation, can resolve contradictions
  • Cons: More expensive, potentially slower, risk of AI hallucination

Option B: Client-Side Concatenation

  • Send model: only new messages (16-25) new summary (16-25)
  • Concatenate: summary(5-15) + summary(16-25) = combined(5-25)
  • Pros: Cost-effective, faster, predictable behavior
  • Cons: Potential fragmentation, no cross-segment awareness

Option C: Hybrid Configurable

  • Allow users to choose strategy based on cost/quality tradeoffs
  • Default to client-side with option for AI-driven

2. Cache Invalidation Strategy

  • Should cache entries expire?
  • How to handle message updates/edits?
  • Thread-based vs global cache management

3. Store Interface Scope

  • Keep minimal (get/set/delete) or add advanced features (batch operations, TTL)?
  • Async vs sync interface design?

4. Thread ID Management

  • Who provides the thread ID? User application or library?
  • Default behavior when no thread ID provided?
  • Should we auto-generate thread IDs for backward compatibility?

5. Conversation Wrapper Benefits

  • Eliminates repetition: No threadId duplication across messages
  • Cleaner API: Separates conversation context from individual message data
  • Better performance: Reduces memory usage and serialization overhead
  • Extensible: Easy to add conversation-level metadata without touching messages

Acceptance Criteria

Definition of done

  • SlimContextStore interface defined in src/interfaces.ts
  • SlimContextConversation wrapper interface implemented
  • Method overloading for compressors (both message array and conversation wrapper)
  • Utility functions for format conversion (wrapMessages/unwrapMessages)
  • Modified SummarizeCompressor to use store for caching
  • Cache key generation utilities using conversation context
  • InMemoryStore reference implementation
  • Summary combination strategy implemented (choose one approach initially)
  • Unit tests for caching behavior and backward compatibility
  • Performance benchmarks showing improvement
  • Documentation for store integration and new conversation wrapper
  • Full backward compatibility maintained (existing compress(messages[]) unchanged)

Additional Context

Supporting information

Design principles alignment:

  • Model-agnostic: Store interface doesn't depend on specific storage technology
  • Framework-independent: Works with any storage backend (Redis, DB, filesystem, memory)
  • BYOM pattern: Users provide their own store implementation
  • Zero runtime dependencies: Core library remains dependency-free

Potential store implementations users might provide:

  • Redis for distributed caching
  • Database tables for persistence
  • File system for local caching
  • Cloud storage (S3, etc.) for serverless environments

Performance impact:

  • Should significantly reduce AI model calls for repeated conversation compression
  • Cache hits avoid expensive summarization operations
  • Memory usage increases with cached summaries (acceptable tradeoff)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions