acidkill · acidkill · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026
@@ -1,7 +1,7 @@
 # MCP Tools Reference
 
 Complete reference for all Surreal-Memory MCP tools.
-**53 tools** available via MCP stdio transport.
+**56 tools** available via MCP stdio transport.
 
 !!! tip
     Tools are called as MCP tool calls, not CLI commands. In Claude Code, call `smem_recall` directly — do not run `smem recall` in terminal.
@@ -71,6 +71,9 @@ Complete reference for all Surreal-Memory MCP tools.
   - [`smem_report_outcome`](#smem_report_outcome)
   - [`smem_budget`](#smem_budget)
   - [`smem_tier`](#smem_tier)
+  - [`smem_offload`](#smem_offload)
+  - [`smem_inflate`](#smem_inflate)
+  - [`smem_situation`](#smem_situation)
 
 ---
 
@@ -94,6 +97,7 @@ Store a memory. Auto-detects type if not specified. Error resolution: when a new
 | `source_id` | string | No | — | Link this memory to a registered source. Creates a SOURCE_OF synapse for provenance tracking. |
 | `context` | object | No | — | Structured context dict merged into content server-side using type-specific templates. Keys like 'reason', 'alternati... |
 | `ephemeral` | boolean | No | — | Session-scoped memory: auto-expires after TTL (default 24h), never synced to cloud, excluded from consolidation. Use ... |
+| `verbose_extraction` | boolean | No | — | Surface concept-extraction observability stats (dropped_short, dropped_noise, dropped_duplicate_entity) in the respon... |
 | `compact` | boolean | No | — | Return compact response (strip metadata hints, truncate lists). Saves 60-80% tokens. |
 | `token_budget` | integer | No | — | Max tokens for response. Progressively strips content to fit budget. |
 
@@ -126,6 +130,7 @@ Query memories by semantic search with confidence ranking.
 | `mode` | string (`associative`, `exact`) | No | — | Recall mode: 'associative' (default) returns formatted context, 'exact' returns raw neuron contents verbatim without ... |
 | `include_citations` | boolean | No | default: true | Include citation and audit trail in exact recall results (default: true). |
 | `recall_token_budget` | integer | No | — | When set, activates budget-aware fiber selection: ranks fibers by value-per-token and selects the most efficient ones... |
+| `prefer_recent` | boolean | No | — | Re-rank matched fibers newest-first (by time_end, fallback created_at). Use for queries about current state ('what's ... |
 | `permanent_only` | boolean | No | — | Exclude ephemeral (session-scoped) memories from results. Default: false (include all). |
 | `clean_for_prompt` | boolean | No | — | Return clean bullet-point text without section headers or neuron-type tags. Use when injecting recall output into pro... |
 | `tier` | string (`hot`, `warm`, `cold`) | No | — | Filter results by memory tier. Only return memories matching this tier. |
@@ -803,6 +808,37 @@ Auto-tier management — promote/demote memories between HOT/WARM/COLD based on
 | `compact` | boolean | No | — | Return compact response (strip metadata hints, truncate lists). Saves 60-80% tokens. |
 | `token_budget` | integer | No | — | Max tokens for response. Progressively strips content to fit budget. |
 
+### `smem_offload`
+
+Store a large tool result as an ephemeral neuron (24h TTL) and return a compact summary + ref_id. Use when tool output is large (>2KB) and you may need to inspect it again later without keeping it in context. Drill back into full content via smem_inflate(ref_id).
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `content` | string | Yes | — | Raw tool output to offload (≤100k chars) |
+| `tool_name` | string | Yes | — | Name of the tool that produced this output (e.g. 'ls', 'grep') |
+| `summary` | string | No | — | Caller-provided summary. If omitted, an auto-summary (first 200 chars + size hint) is generated. |
+| `compact` | boolean | No | — | Return compact response (strip metadata hints, truncate lists). Saves 60-80% tokens. |
+| `token_budget` | integer | No | — | Max tokens for response. Progressively strips content to fit budget. |
+
+### `smem_inflate`
+
+Retrieve full content of a previously offloaded tool result by its ref_id (returned from smem_offload). Returns the original raw content. Returns an error if the ref has expired or never existed.
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `ref_id` | string | Yes | — | ref_id returned by smem_offload |
+| `compact` | boolean | No | — | Return compact response (strip metadata hints, truncate lists). Saves 60-80% tokens. |
+| `token_budget` | integer | No | — | Max tokens for response. Progressively strips content to fit budget. |
+
+### `smem_situation`
+
+One-shot snapshot of the current working situation: active session task, top 3 recent decisions, open blockers, gap detection. Replaces smem_recap + multiple smem_recall calls when resuming a session. Pure read — never mutates state.
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `compact` | boolean | No | — | Return compact response (strip metadata hints, truncate lists). Saves 60-80% tokens. |
+| `token_budget` | integer | No | — | Max tokens for response. Progressively strips content to fit budget. |
+
 ---
 
-*Auto-generated by `scripts/gen_mcp_docs.py` from `tool_schemas.py` — 53 tools.*
+*Auto-generated by `scripts/gen_mcp_docs.py` from `tool_schemas.py` — 56 tools.*
@@ -13,7 +13,7 @@
 def preset_cmd(
     name: Annotated[
         str,
-        typer.Argument(help="Preset name: safe-cost, balanced, max-recall"),
+        typer.Argument(help="Preset name: safe-cost, balanced, max-recall, chat-heavy"),
     ] = "",
     list_available: Annotated[
         bool,

@@ -1,6 +1,6 @@
 """Static configuration presets for Surreal-Memory.
 
-Three built-in profiles that configure brain behavior, maintenance,
+Four built-in profiles that configure brain behavior, maintenance,
 and retrieval for different use cases. Presets are static dicts
 (not a plugin system) to keep the surface simple and predictable.
 
@@ -82,16 +82,38 @@
     },
 }
 
+CHAT_HEAVY: dict[str, dict[str, Any]] = {
+    "brain": {
+        "decay_rate": 0.15,
+        "reinforcement_delta": 0.05,
+        "activation_threshold": 0.25,
+        "max_spread_hops": 3,
+        "max_context_tokens": 800,
+        "freshness_weight": 0.25,
+    },
+    "maintenance": {
+        "auto_consolidate": True,
+        "check_interval": 20,
+        "auto_consolidate_strategies": ["prune", "merge"],
+        "consolidate_cooldown_minutes": 20,
+    },
+    "eternal": {
+        "max_context_tokens": 64_000,
+    },
+}
+
 _PRESETS: dict[str, dict[str, dict[str, Any]]] = {
     "safe-cost": SAFE_COST,
     "balanced": BALANCED,
     "max-recall": MAX_RECALL,
+    "chat-heavy": CHAT_HEAVY,
 }
 
 _DESCRIPTIONS: dict[str, str] = {
     "safe-cost": "Lower token usage, faster decay, aggressive pruning",
     "balanced": "Default settings — good all-around performance",
     "max-recall": "Maximum retention, deeper retrieval, conservative pruning",
+    "chat-heavy": "Conversational agents (Telegram/Discord/Slack) — fast decay, recent-biased, compact",
 }
 
 

@@ -64,13 +64,17 @@ class EncodingResult:
         neurons_created: List of newly created neurons
         neurons_linked: List of existing neuron IDs that were linked
         synapses_created: List of newly created synapses
+        extraction_stats: Optional concept-extraction counters when callers
+            opt in via verbose_extraction. Surface schema:
+            ``{"dropped_short", "dropped_noise", "dropped_duplicate_entity"}``.
     """
 
     fiber: Fiber
     neurons_created: list[Neuron]
     neurons_linked: list[str]
     synapses_created: list[Synapse]
     conflicts_detected: int = 0
+    extraction_stats: dict[str, int] | None = None
 
 
 def build_default_pipeline(
@@ -432,6 +436,11 @@ async def encode(
             neurons_linked=ctx.neurons_linked,
             synapses_created=ctx.synapses_created,
             conflicts_detected=ctx.conflicts_detected,
+            extraction_stats={
+                "dropped_short": ctx.dropped_short,
+                "dropped_noise": ctx.dropped_noise,
+                "dropped_duplicate_entity": ctx.dropped_duplicate_entity,
+            },
         )
 
     async def _post_encode_neuro(self, anchor: Neuron) -> None:

@@ -72,6 +72,12 @@ class PipelineContext:
     # Entities stored here are first-mentions — not yet promoted to neurons
     deferred_entity_refs: list[str] = field(default_factory=list)
 
+    # Concept extraction observability — incremented by ExtractConceptNeuronsStep.
+    # Surfaced through EncodingResult.extraction_stats only when callers opt in.
+    dropped_short: int = 0
+    dropped_noise: int = 0
+    dropped_duplicate_entity: int = 0
+
 
 @runtime_checkable
 class PipelineStep(Protocol):

@@ -320,11 +320,14 @@ def _is_valid_concept(kw: str) -> bool:
             kw_lower = kw.lower()
             # Minimum 4 chars — 3-char words produce too many noise concepts ("ai", "os")
             if len(kw_lower) < 4:
+                ctx.dropped_short += 1
                 return False
             if kw_lower in _NOISE_CONCEPTS:
+                ctx.dropped_noise += 1
                 return False
             # Skip if already captured as an entity neuron
             if kw_lower in entity_content:
+                ctx.dropped_duplicate_entity += 1
                 return False
             return True
 

@@ -0,0 +1,202 @@
+"""MCP handler mixin for tool result offload tools.
+
+Phase 1 of agent-ergonomics: reduces context bloat by storing large tool
+results as ephemeral neurons (24h TTL) and returning a compact ref + summary
+that the agent can drill into via ``smem_inflate`` when needed.
+
+No LLM calls, no compression — pure store/lookup.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import TYPE_CHECKING, Any
+
+from surreal_memory.core.neuron import Neuron, NeuronType
+from surreal_memory.engine.token_budget import TOKEN_RATIO
+from surreal_memory.mcp.tool_handler_utils import _get_brain_or_error
+
+if TYPE_CHECKING:
+    from surreal_memory.storage.base import NeuralStorage
+    from surreal_memory.unified_config import UnifiedConfig
+
+logger = logging.getLogger(__name__)
+
+# Hard cap on offload content — same ceiling as smem_remember to keep storage sane
+_MAX_CONTENT_LEN = 100_000
+
+# Caps on caller-controlled string fields (handler-side, schema is only advisory)
+_MAX_TOOL_NAME_LEN = 100
+_MAX_EXPLICIT_SUMMARY_LEN = 500
+
+# Preview length for auto-generated summaries
+_SUMMARY_PREVIEW_LEN = 200
+
+# Hard cap on the final summary string returned to caller — keeps the
+# offload contract ("summary is small") true even with long tool_names.
+_MAX_SUMMARY_LEN = 300
+
+
+def _estimate_tokens(content: str) -> int:
+    """Rough token estimate — uses whichever of (words x ratio) or (chars / 4) is larger.
+
+    The dual estimate guards against pathological inputs (e.g. a 5000-char run
+    of identical bytes with no whitespace) where word count under-reports cost.
+    """
+    words = len(content.split())
+    word_based = int(words * TOKEN_RATIO)
+    char_based = len(content) // 4  # ~4 chars/token rule of thumb for English
+    return max(1, word_based, char_based)
+
+
+def _build_summary(content: str, tool_name: str) -> str:
+    """Generate a compact preview + size hint for an offloaded payload.
+
+    Output is hard-capped at ``_MAX_SUMMARY_LEN`` to keep the offload
+    contract honest regardless of tool_name length.
+    """
+    preview = content[:_SUMMARY_PREVIEW_LEN].replace("\n", " ").strip()
+    if len(content) > _SUMMARY_PREVIEW_LEN:
+        preview += "…"
+    line_count = content.count("\n") + 1
+    byte_count = len(content)
+    summary = f"[{tool_name}] {preview} (~{line_count} lines, {byte_count}B)"
+    if len(summary) > _MAX_SUMMARY_LEN:
+        summary = summary[: _MAX_SUMMARY_LEN - 1] + "…"
+    return summary
+
+
+class OffloadHandler:
+    """Mixin: tool result offload + inflate tools."""
+
+    if TYPE_CHECKING:
+        config: UnifiedConfig
+
+        async def get_storage(self) -> NeuralStorage:
+            raise NotImplementedError
+
+    async def _offload(self, args: dict[str, Any]) -> dict[str, Any]:
+        """Store a large tool result as an ephemeral neuron, return a compact ref.
+
+        Args:
+            content: Raw tool output to offload (required, ≤100k chars). The
+                content is sanitized for prompt-injection markers and run
+                through the auto-redactor before storage (same pipeline as
+                smem_remember) so leaked secrets in tool output are scrubbed.
+            tool_name: Name of the tool that produced the output (required,
+                truncated to 100 chars).
+            summary: Caller-provided summary (optional; auto-generated if
+                absent, max 500 chars).
+            (ttl is fixed at 24h via the ephemeral expiry handler — no
+            ttl_hours arg is accepted.)
+
+        Returns:
+            ``{ref_id, summary, token_saved, redacted}`` on success,
+            ``{error}`` on failure. ``redacted`` is True when sensitive
+            content was scrubbed.
+        """
+        content = args.get("content")
+        tool_name_raw = args.get("tool_name") or "unknown"
+        tool_name = str(tool_name_raw)[:_MAX_TOOL_NAME_LEN]
+        explicit_summary_raw = args.get("summary")
+        explicit_summary = (
+            str(explicit_summary_raw)[:_MAX_EXPLICIT_SUMMARY_LEN]
+            if isinstance(explicit_summary_raw, str)
+            else None
+        )
+
+        if not content or not isinstance(content, str):
+            return {"error": "content is required and must be a non-empty string"}
+        if len(content) > _MAX_CONTENT_LEN:
+            return {"error": f"Content too long ({len(content)} chars). Max: {_MAX_CONTENT_LEN}."}
+
+        try:
+            storage = await self.get_storage()
+            _brain, err = await _get_brain_or_error(storage)
+            if err:
+                return err
+
+            # Defense in depth — tool output is a common vector for accidental
+            # secret capture (API keys in grep, tokens in curl logs, etc).
+            # Mirror the remember_handler safety pipeline.
+            from surreal_memory.safety.input_firewall import sanitize_explicit_content
+            from surreal_memory.safety.sensitive import auto_redact_content
+
+            content = sanitize_explicit_content(content)
+            try:
+                redact_severity = int(self.config.safety.auto_redact_min_severity)
+            except (TypeError, ValueError, AttributeError):
+                redact_severity = 3
+            redacted_content, redacted_matches, _hash = auto_redact_content(
+                content, min_severity=redact_severity
+            )
+            redacted = bool(redacted_matches)
+            if redacted:
+                content = redacted_content
+                logger.info(
+                    "smem_offload: auto-redacted %d sensitive matches for tool=%s",
+                    len(redacted_matches),
+                    tool_name,
+                )
+
+            summary = explicit_summary or _build_summary(content, tool_name)
+            token_estimate = _estimate_tokens(content)
+            summary_tokens = _estimate_tokens(summary)
+            token_saved = max(0, token_estimate - summary_tokens)
+
+            neuron = Neuron.create(
+                type=NeuronType.CONCEPT,
+                content=content,
+                metadata={
+                    "_source": "tool_offload",
+                    "_tool_name": tool_name,
+                    "_summary": summary,
+                    "_offload_token_estimate": token_estimate,
+                    "_offload_redacted": redacted,
+                },
+                ephemeral=True,
+            )
+            await storage.add_neuron(neuron)
+
+            return {
+                "ref_id": neuron.id,
+                "summary": summary,
+                "token_saved": token_saved,
+                "redacted": redacted,
+            }
+        except Exception:
+            logger.error("Offload failed for tool=%s", tool_name, exc_info=True)
+            return {"error": "Offload failed"}
+
+    async def _inflate(self, args: dict[str, Any]) -> dict[str, Any]:
+        """Retrieve full content of a previously offloaded neuron by ref_id.
+
+        Args:
+            ref_id: Neuron ID returned by ``smem_offload`` (required)
+
+        Returns:
+            ``{content, tool_name, summary}`` on success, ``{error}`` on failure.
+        """
+        ref_id = args.get("ref_id")
+        if not ref_id or not isinstance(ref_id, str):
+            return {"error": "ref_id is required and must be a string"}
+
+        try:
+            storage = await self.get_storage()
+            neuron = await storage.get_neuron(ref_id)
+            if neuron is None:
+                return {"error": f"ref_id not found or expired: {ref_id}"}
+
+            meta = neuron.metadata or {}
+            if meta.get("_source") != "tool_offload":
+                # Don't allow inflate to peek at arbitrary neurons — only offload payloads.
+                return {"error": f"ref_id {ref_id} is not an offloaded payload"}
+
+            return {
+                "content": neuron.content,
+                "tool_name": meta.get("_tool_name", "unknown"),
+                "summary": meta.get("_summary", ""),
+            }
+        except Exception:
+            logger.error("Inflate failed for ref_id=%s", ref_id, exc_info=True)
+            return {"error": "Inflate failed"}