Skip to content

fix(haiku): pass prompt on stdin, not argv — saves of long sessions die with E2BIG#107

Merged
fdaviddpt merged 1 commit into
Digital-Process-Tools:mainfrom
kays0x:fix/haiku-stdin-e2big
Jun 21, 2026
Merged

fix(haiku): pass prompt on stdin, not argv — saves of long sessions die with E2BIG#107
fdaviddpt merged 1 commit into
Digital-Process-Tools:mainfrom
kays0x:fix/haiku-stdin-e2big

Conversation

@kays0x

@kays0x kays0x commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Summary

call_haiku passes the entire prompt as a single command-line argument (claude -p <prompt>). On Linux a single argv string is capped at MAX_ARG_STRLEN (131072 bytes = 128 KB = PAGE_SIZE * 32). When a session's extract reaches that size, subprocess.run fails at exec() with OSError: [Errno 7] Argument list too long, which surfaces as a RuntimeError and the save is lost.

The failure is silent (only a log line) and permanent for that session — the extract only grows, so every subsequent save fails the same way. It strikes exactly the sessions a memory tool most wants to capture: long ones, recovery of a missed session, or any accumulated delta over 128 KB.

Root cause

pipeline/haiku.py:

cmd = ["claude", "-p", prompt, ...]      # prompt is one argv string
subprocess.run(cmd, ...)                 # exec() → E2BIG when len(prompt) > 128KB

Reproduction (no API call needed — it fails at exec)

import subprocess
big = "x" * (128 * 1024)
subprocess.run(["/bin/true", big])       # OSError: [Errno 7] Argument list too long
subprocess.run(["/bin/cat"], input=big, text=True)  # fine — stdin has no argv limit

Threshold is exact: 131071 bytes argv → ok, 131072 bytes (128 KB) → E2BIG.

Fix

claude -p reads the prompt from stdin when no positional prompt is given. Drop the positional and pipe via input=:

cmd = ["claude", "-p", "--output-format", "json", ...]   # no positional prompt
subprocess.run(cmd, input=prompt, ...)                    # prompt on stdin, unbounded

stdin has no size limit, so prompts of any length deliver. No behavior change for normal-size prompts.

Testing

  • Adds test_call_haiku_sends_prompt_on_stdin_not_argv — asserts the prompt is passed via input= and is absent from the command argv. It passes on this change and fails against the old argv form (the existing tests were blind to it: they checked cmd[0] and env, never input=), so the regression can't silently return. tests/test_haiku.py now 31/31.
  • Full suite: no new failures vs the pre-change baseline.
  • Real-environment: a save whose extract was ~1.9 MB failed with call-haiku error: [Errno 7] Argument list too long before the change, and completed normally (Haiku returned a summary, entry written) after it.

Risk

Minimal. subprocess.run(..., input=, capture_output=True) uses communicate() internally, so there is no stdin/stdout deadlock. input is a str consistent with text=True/encoding="utf-8".

Not changed

Scope is the single argv→stdin swap. No change to flags, parsing, env stripping, or token accounting.

…ions)

call_haiku passed the full prompt as a single argv string (claude -p <prompt>).
Linux caps one argument at MAX_ARG_STRLEN (131072 bytes = 128KB), so a prompt
at or above that size fails at exec() with OSError [Errno 7] "Argument list too
long" — surfaced as RuntimeError and the save is lost. The failure is silent
(log only) and permanent for that session, since the extract only grows. It
hits exactly the sessions worth capturing: long ones, recovery of a missed
session, and any accumulated delta over 128KB.

claude -p reads the prompt from stdin when no positional prompt is given, so
pass it via subprocess input= instead. stdin has no argv size limit; no
behavior change for normal-size prompts.

Adds a regression test asserting the prompt arrives via input= and is absent
from the command argv (it fails against the old argv form). Repro:
subprocess.run(["/bin/true", "x"*131072]) raises E2BIG; the same payload via
input= on stdin runs fine.

@fdaviddpt fdaviddpt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix — and a nasty bug well caught.

The failure mode is the scary part: extracts only grow, so once a session crosses 128KB every subsequent save dies, silently (just a log line). That hits exactly the long sessions a memory tool most wants to keep.

The fix is correct and minimal — claude -p reads stdin with no positional, subprocess.run(input=...) uses communicate() so no deadlock, and there's no behavior change for normal-size prompts. Bonus: prompt no longer shows up in the process list.

The test earns its keep too — guards both halves (prompt in input=, absent from argv) and fails against the old form, so the regression can't sneak back. Nice catch on the old tests being blind to it.

Thanks Kevin — repro without an API call, exact threshold, real-env validation on a ~1.9MB extract. That's how a bug fix should read. 🙏

@fdaviddpt fdaviddpt merged commit 1c80bfb into Digital-Process-Tools:main Jun 21, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants