Skip to content

fix(mcp): stop memory tool hangs from s-expression parser infinite loop#1421

Merged
bplatz merged 1 commit into
mainfrom
fix/mcp-sexpr-parser-hang
Jul 5, 2026
Merged

fix(mcp): stop memory tool hangs from s-expression parser infinite loop#1421
bplatz merged 1 commit into
mainfrom
fix/mcp-sexpr-parser-hang

Conversation

@bplatz

@bplatz bplatz commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Problem

memory_add (and memory_recall) on the fluree MCP server could hang forever — the tool call never responds and one tokio worker gets permanently pinned at 100% CPU. Bursts of 3+ concurrent adds made this near-certain, and each occurrence eats another runtime worker until the whole server starves. PR #1384's lock work is sound but was unrelated: all memory locks are 30s-bounded and can't produce an unbounded hang.

Root cause

Diagnosed on a live wedged v4.1.1 mcp serve process (12.7h of accumulated CPU) via sample/lldb: the spinning thread was re-trimming the same content slice starting with ) inside parse_s_expression_args.

The chain:

  1. After every successful add, the related-memories lookup feeds the raw memory content back as a recall query.
  2. recall_fulltext interpolates it into a (fulltext ?content "...") bind expression, escaping only "\".
  3. The s-expression parser had no escape support, so the string token ended at the first embedded quote, leaving the rest of the content as garbage tokens.
  4. When the leftover text starts with ), parse_s_expression_args finds the delimiter at position 0, consumes zero bytes, and loops forever.

Memory content containing a double quote followed by a paren — routine for code-flavored insights — reliably wedges every v4.1.1 install. Reproduced end-to-end over MCP stdio with a single add.

Fixes

  • Termination guard: an argument token that consumes no input is now a ParseError, guaranteeing the parse loop terminates on any malformed input (parse_s_expression_args, parse_s_expression_arg).
  • Escape support: string literals honor \" and \\ throughout — quoted_string_end/unescape_string in filter_sexpr, the sexpr tokenizer, and find_matching_paren; find_matching_bracket is now also quote-aware. Unrecognized \x sequences are preserved verbatim.
  • Proper interpolation: recall_fulltext escapes backslashes before quotes so arbitrary content round-trips through the parser.
  • Init race: MemoryStore::initialize tolerates losing the create_ledger race — concurrent first-burst tool calls previously failed with Ledger already exists: __memory:main.

Testing

  • New parser regression tests, including the exact wedge shape (stray ')' errors instead of looping, escaped quotes with parens, hostile fulltext round-trip)
  • New store-level test: recall_fulltext with quote+paren+backslash content
  • Full fluree-db-query lib suite (1,177), fluree-db-memory suite (64), and grp_query integration suite (299) pass; clippy clean
  • Verified end-to-end against a rebuilt CLI over MCP stdio: 4 concurrent first-burst adds with hostile content + a hostile recall all succeed with idle CPU afterward (previously: no response, core pinned)

memory_add could hang forever, permanently pinning a tokio worker at
100% CPU. After every add, the related-memories lookup feeds the raw
memory content back through recall_fulltext, which interpolates it into
a '(fulltext ?content "...")' bind expression escaping only double
quotes. The s-expression parser had no escape support, so the string
token ended at the first embedded quote — and once the leftover text
started with ')', parse_s_expression_args consumed zero bytes per
iteration and looped forever. Content containing a quote followed by a
paren (common for code-flavored memories) wedged the server; each
occurrence ate one runtime worker until the whole MCP server starved.

Fixes:
- parse_s_expression_args / parse_s_expression_arg: a token that
  consumes no input is now a ParseError, guaranteeing termination on
  any malformed input
- string literals honor backslash escapes (\" and \\) throughout:
  quoted_string_end/unescape_string in filter_sexpr, the sexpr
  tokenizer, and find_matching_paren; find_matching_bracket is now
  also quote-aware
- recall_fulltext escapes backslashes before quotes so content
  round-trips through the parser
- MemoryStore::initialize tolerates losing the create_ledger race
  (concurrent first-burst tool calls returned 'Ledger already exists'
  errors)

Diagnosed on a live wedged v4.1.1 serve process via sample/lldb: the
spinning thread was re-trimming the same 231-byte content slice
starting with ')'. Reproduced end-to-end over MCP stdio with a single
add whose content contains '") '; regression tests cover the parser
and the recall path.
@bplatz bplatz requested review from aaj3f and zonotope July 2, 2026 12:09

@aaj3f aaj3f left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grateful for this -- was going to investigate why I still saw the issue after the other PR merged, but this makes sense

@bplatz bplatz merged commit 3cdb58c into main Jul 5, 2026
13 checks passed
@bplatz bplatz deleted the fix/mcp-sexpr-parser-hang branch July 5, 2026 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants