Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 38 additions & 2 deletions docs/api/mcp-tools.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# MCP Tools Reference

Complete reference for all Surreal-Memory MCP tools.
**53 tools** available via MCP stdio transport.
**56 tools** available via MCP stdio transport.

!!! tip
Tools are called as MCP tool calls, not CLI commands. In Claude Code, call `smem_recall` directly — do not run `smem recall` in terminal.
Expand Down Expand Up @@ -71,6 +71,9 @@ Complete reference for all Surreal-Memory MCP tools.
- [`smem_report_outcome`](#smem_report_outcome)
- [`smem_budget`](#smem_budget)
- [`smem_tier`](#smem_tier)
- [`smem_offload`](#smem_offload)
- [`smem_inflate`](#smem_inflate)
- [`smem_situation`](#smem_situation)

---

Expand All @@ -94,6 +97,7 @@ Store a memory. Auto-detects type if not specified. Error resolution: when a new
| `source_id` | string | No | — | Link this memory to a registered source. Creates a SOURCE_OF synapse for provenance tracking. |
| `context` | object | No | — | Structured context dict merged into content server-side using type-specific templates. Keys like 'reason', 'alternati... |
| `ephemeral` | boolean | No | — | Session-scoped memory: auto-expires after TTL (default 24h), never synced to cloud, excluded from consolidation. Use ... |
| `verbose_extraction` | boolean | No | — | Surface concept-extraction observability stats (dropped_short, dropped_noise, dropped_duplicate_entity) in the respon... |
| `compact` | boolean | No | — | Return compact response (strip metadata hints, truncate lists). Saves 60-80% tokens. |
| `token_budget` | integer | No | — | Max tokens for response. Progressively strips content to fit budget. |

Expand Down Expand Up @@ -126,6 +130,7 @@ Query memories by semantic search with confidence ranking.
| `mode` | string (`associative`, `exact`) | No | — | Recall mode: 'associative' (default) returns formatted context, 'exact' returns raw neuron contents verbatim without ... |
| `include_citations` | boolean | No | default: true | Include citation and audit trail in exact recall results (default: true). |
| `recall_token_budget` | integer | No | — | When set, activates budget-aware fiber selection: ranks fibers by value-per-token and selects the most efficient ones... |
| `prefer_recent` | boolean | No | — | Re-rank matched fibers newest-first (by time_end, fallback created_at). Use for queries about current state ('what's ... |
| `permanent_only` | boolean | No | — | Exclude ephemeral (session-scoped) memories from results. Default: false (include all). |
| `clean_for_prompt` | boolean | No | — | Return clean bullet-point text without section headers or neuron-type tags. Use when injecting recall output into pro... |
| `tier` | string (`hot`, `warm`, `cold`) | No | — | Filter results by memory tier. Only return memories matching this tier. |
Expand Down Expand Up @@ -803,6 +808,37 @@ Auto-tier management — promote/demote memories between HOT/WARM/COLD based on
| `compact` | boolean | No | — | Return compact response (strip metadata hints, truncate lists). Saves 60-80% tokens. |
| `token_budget` | integer | No | — | Max tokens for response. Progressively strips content to fit budget. |

### `smem_offload`

Store a large tool result as an ephemeral neuron (24h TTL) and return a compact summary + ref_id. Use when tool output is large (>2KB) and you may need to inspect it again later without keeping it in context. Drill back into full content via smem_inflate(ref_id).

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `content` | string | Yes | — | Raw tool output to offload (≤100k chars) |
| `tool_name` | string | Yes | — | Name of the tool that produced this output (e.g. 'ls', 'grep') |
| `summary` | string | No | — | Caller-provided summary. If omitted, an auto-summary (first 200 chars + size hint) is generated. |
| `compact` | boolean | No | — | Return compact response (strip metadata hints, truncate lists). Saves 60-80% tokens. |
| `token_budget` | integer | No | — | Max tokens for response. Progressively strips content to fit budget. |

### `smem_inflate`

Retrieve full content of a previously offloaded tool result by its ref_id (returned from smem_offload). Returns the original raw content. Returns an error if the ref has expired or never existed.

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `ref_id` | string | Yes | — | ref_id returned by smem_offload |
| `compact` | boolean | No | — | Return compact response (strip metadata hints, truncate lists). Saves 60-80% tokens. |
| `token_budget` | integer | No | — | Max tokens for response. Progressively strips content to fit budget. |

### `smem_situation`

One-shot snapshot of the current working situation: active session task, top 3 recent decisions, open blockers, gap detection. Replaces smem_recap + multiple smem_recall calls when resuming a session. Pure read — never mutates state.

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `compact` | boolean | No | — | Return compact response (strip metadata hints, truncate lists). Saves 60-80% tokens. |
| `token_budget` | integer | No | — | Max tokens for response. Progressively strips content to fit budget. |

---

*Auto-generated by `scripts/gen_mcp_docs.py` from `tool_schemas.py` — 53 tools.*
*Auto-generated by `scripts/gen_mcp_docs.py` from `tool_schemas.py` — 56 tools.*
2 changes: 1 addition & 1 deletion src/surreal_memory/cli/commands/config_cmd.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
def preset_cmd(
name: Annotated[
str,
typer.Argument(help="Preset name: safe-cost, balanced, max-recall"),
typer.Argument(help="Preset name: safe-cost, balanced, max-recall, chat-heavy"),
] = "",
list_available: Annotated[
bool,
Expand Down
24 changes: 23 additions & 1 deletion src/surreal_memory/config_presets.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Static configuration presets for Surreal-Memory.

Three built-in profiles that configure brain behavior, maintenance,
Four built-in profiles that configure brain behavior, maintenance,
and retrieval for different use cases. Presets are static dicts
(not a plugin system) to keep the surface simple and predictable.

Expand Down Expand Up @@ -82,16 +82,38 @@
},
}

CHAT_HEAVY: dict[str, dict[str, Any]] = {
"brain": {
"decay_rate": 0.15,
"reinforcement_delta": 0.05,
"activation_threshold": 0.25,
"max_spread_hops": 3,
"max_context_tokens": 800,
"freshness_weight": 0.25,
},
"maintenance": {
"auto_consolidate": True,
"check_interval": 20,
"auto_consolidate_strategies": ["prune", "merge"],
"consolidate_cooldown_minutes": 20,
},
"eternal": {
"max_context_tokens": 64_000,
},
}

_PRESETS: dict[str, dict[str, dict[str, Any]]] = {
"safe-cost": SAFE_COST,
"balanced": BALANCED,
"max-recall": MAX_RECALL,
"chat-heavy": CHAT_HEAVY,
}

_DESCRIPTIONS: dict[str, str] = {
"safe-cost": "Lower token usage, faster decay, aggressive pruning",
"balanced": "Default settings — good all-around performance",
"max-recall": "Maximum retention, deeper retrieval, conservative pruning",
"chat-heavy": "Conversational agents (Telegram/Discord/Slack) — fast decay, recent-biased, compact",
}


Expand Down
9 changes: 9 additions & 0 deletions src/surreal_memory/engine/encoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,17 @@ class EncodingResult:
neurons_created: List of newly created neurons
neurons_linked: List of existing neuron IDs that were linked
synapses_created: List of newly created synapses
extraction_stats: Optional concept-extraction counters when callers
opt in via verbose_extraction. Surface schema:
``{"dropped_short", "dropped_noise", "dropped_duplicate_entity"}``.
"""

fiber: Fiber
neurons_created: list[Neuron]
neurons_linked: list[str]
synapses_created: list[Synapse]
conflicts_detected: int = 0
extraction_stats: dict[str, int] | None = None


def build_default_pipeline(
Expand Down Expand Up @@ -432,6 +436,11 @@ async def encode(
neurons_linked=ctx.neurons_linked,
synapses_created=ctx.synapses_created,
conflicts_detected=ctx.conflicts_detected,
extraction_stats={
"dropped_short": ctx.dropped_short,
"dropped_noise": ctx.dropped_noise,
"dropped_duplicate_entity": ctx.dropped_duplicate_entity,
},
)

async def _post_encode_neuro(self, anchor: Neuron) -> None:
Expand Down
6 changes: 6 additions & 0 deletions src/surreal_memory/engine/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ class PipelineContext:
# Entities stored here are first-mentions — not yet promoted to neurons
deferred_entity_refs: list[str] = field(default_factory=list)

# Concept extraction observability — incremented by ExtractConceptNeuronsStep.
# Surfaced through EncodingResult.extraction_stats only when callers opt in.
dropped_short: int = 0
dropped_noise: int = 0
dropped_duplicate_entity: int = 0


@runtime_checkable
class PipelineStep(Protocol):
Expand Down
3 changes: 3 additions & 0 deletions src/surreal_memory/engine/pipeline_steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,11 +320,14 @@ def _is_valid_concept(kw: str) -> bool:
kw_lower = kw.lower()
# Minimum 4 chars — 3-char words produce too many noise concepts ("ai", "os")
if len(kw_lower) < 4:
ctx.dropped_short += 1
return False
if kw_lower in _NOISE_CONCEPTS:
ctx.dropped_noise += 1
return False
# Skip if already captured as an entity neuron
if kw_lower in entity_content:
ctx.dropped_duplicate_entity += 1
return False
return True

Expand Down
202 changes: 202 additions & 0 deletions src/surreal_memory/mcp/offload_handler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
"""MCP handler mixin for tool result offload tools.

Phase 1 of agent-ergonomics: reduces context bloat by storing large tool
results as ephemeral neurons (24h TTL) and returning a compact ref + summary
that the agent can drill into via ``smem_inflate`` when needed.

No LLM calls, no compression — pure store/lookup.
"""

from __future__ import annotations

import logging
from typing import TYPE_CHECKING, Any

from surreal_memory.core.neuron import Neuron, NeuronType
from surreal_memory.engine.token_budget import TOKEN_RATIO
from surreal_memory.mcp.tool_handler_utils import _get_brain_or_error

if TYPE_CHECKING:
from surreal_memory.storage.base import NeuralStorage
from surreal_memory.unified_config import UnifiedConfig

logger = logging.getLogger(__name__)

# Hard cap on offload content — same ceiling as smem_remember to keep storage sane
_MAX_CONTENT_LEN = 100_000

# Caps on caller-controlled string fields (handler-side, schema is only advisory)
_MAX_TOOL_NAME_LEN = 100
_MAX_EXPLICIT_SUMMARY_LEN = 500

# Preview length for auto-generated summaries
_SUMMARY_PREVIEW_LEN = 200

# Hard cap on the final summary string returned to caller — keeps the
# offload contract ("summary is small") true even with long tool_names.
_MAX_SUMMARY_LEN = 300


def _estimate_tokens(content: str) -> int:
"""Rough token estimate — uses whichever of (words x ratio) or (chars / 4) is larger.

The dual estimate guards against pathological inputs (e.g. a 5000-char run
of identical bytes with no whitespace) where word count under-reports cost.
"""
words = len(content.split())
word_based = int(words * TOKEN_RATIO)
char_based = len(content) // 4 # ~4 chars/token rule of thumb for English
return max(1, word_based, char_based)


def _build_summary(content: str, tool_name: str) -> str:
"""Generate a compact preview + size hint for an offloaded payload.

Output is hard-capped at ``_MAX_SUMMARY_LEN`` to keep the offload
contract honest regardless of tool_name length.
"""
preview = content[:_SUMMARY_PREVIEW_LEN].replace("\n", " ").strip()
if len(content) > _SUMMARY_PREVIEW_LEN:
preview += "…"
line_count = content.count("\n") + 1
byte_count = len(content)
summary = f"[{tool_name}] {preview} (~{line_count} lines, {byte_count}B)"
if len(summary) > _MAX_SUMMARY_LEN:
summary = summary[: _MAX_SUMMARY_LEN - 1] + "…"
return summary


class OffloadHandler:
"""Mixin: tool result offload + inflate tools."""

if TYPE_CHECKING:
config: UnifiedConfig

async def get_storage(self) -> NeuralStorage:
raise NotImplementedError

async def _offload(self, args: dict[str, Any]) -> dict[str, Any]:
"""Store a large tool result as an ephemeral neuron, return a compact ref.

Args:
content: Raw tool output to offload (required, ≤100k chars). The
content is sanitized for prompt-injection markers and run
through the auto-redactor before storage (same pipeline as
smem_remember) so leaked secrets in tool output are scrubbed.
tool_name: Name of the tool that produced the output (required,
truncated to 100 chars).
summary: Caller-provided summary (optional; auto-generated if
absent, max 500 chars).
(ttl is fixed at 24h via the ephemeral expiry handler — no
ttl_hours arg is accepted.)

Returns:
``{ref_id, summary, token_saved, redacted}`` on success,
``{error}`` on failure. ``redacted`` is True when sensitive
content was scrubbed.
"""
content = args.get("content")
tool_name_raw = args.get("tool_name") or "unknown"
tool_name = str(tool_name_raw)[:_MAX_TOOL_NAME_LEN]
explicit_summary_raw = args.get("summary")
explicit_summary = (
str(explicit_summary_raw)[:_MAX_EXPLICIT_SUMMARY_LEN]
if isinstance(explicit_summary_raw, str)
else None
)

if not content or not isinstance(content, str):
return {"error": "content is required and must be a non-empty string"}
if len(content) > _MAX_CONTENT_LEN:
return {"error": f"Content too long ({len(content)} chars). Max: {_MAX_CONTENT_LEN}."}

try:
storage = await self.get_storage()
_brain, err = await _get_brain_or_error(storage)
if err:
return err

# Defense in depth — tool output is a common vector for accidental
# secret capture (API keys in grep, tokens in curl logs, etc).
# Mirror the remember_handler safety pipeline.
from surreal_memory.safety.input_firewall import sanitize_explicit_content
from surreal_memory.safety.sensitive import auto_redact_content

content = sanitize_explicit_content(content)
try:
redact_severity = int(self.config.safety.auto_redact_min_severity)
except (TypeError, ValueError, AttributeError):
redact_severity = 3
redacted_content, redacted_matches, _hash = auto_redact_content(
content, min_severity=redact_severity
)
redacted = bool(redacted_matches)
if redacted:
content = redacted_content
logger.info(
"smem_offload: auto-redacted %d sensitive matches for tool=%s",
len(redacted_matches),
tool_name,
)

summary = explicit_summary or _build_summary(content, tool_name)
token_estimate = _estimate_tokens(content)
summary_tokens = _estimate_tokens(summary)
token_saved = max(0, token_estimate - summary_tokens)

neuron = Neuron.create(
type=NeuronType.CONCEPT,
content=content,
metadata={
"_source": "tool_offload",
"_tool_name": tool_name,
"_summary": summary,
"_offload_token_estimate": token_estimate,
"_offload_redacted": redacted,
},
ephemeral=True,
)
await storage.add_neuron(neuron)

return {
"ref_id": neuron.id,
"summary": summary,
"token_saved": token_saved,
"redacted": redacted,
}
except Exception:
logger.error("Offload failed for tool=%s", tool_name, exc_info=True)
return {"error": "Offload failed"}

async def _inflate(self, args: dict[str, Any]) -> dict[str, Any]:
"""Retrieve full content of a previously offloaded neuron by ref_id.

Args:
ref_id: Neuron ID returned by ``smem_offload`` (required)

Returns:
``{content, tool_name, summary}`` on success, ``{error}`` on failure.
"""
ref_id = args.get("ref_id")
if not ref_id or not isinstance(ref_id, str):
return {"error": "ref_id is required and must be a string"}

try:
storage = await self.get_storage()
neuron = await storage.get_neuron(ref_id)
if neuron is None:
return {"error": f"ref_id not found or expired: {ref_id}"}

meta = neuron.metadata or {}
if meta.get("_source") != "tool_offload":
# Don't allow inflate to peek at arbitrary neurons — only offload payloads.
return {"error": f"ref_id {ref_id} is not an offloaded payload"}

return {
"content": neuron.content,
"tool_name": meta.get("_tool_name", "unknown"),
"summary": meta.get("_summary", ""),
}
except Exception:
logger.error("Inflate failed for ref_id=%s", ref_id, exc_info=True)
return {"error": "Inflate failed"}
Loading
Loading