Skip to content

fix(core): extract pi-cli tool calls from streaming events#782

Merged
christso merged 5 commits intomainfrom
fix/780-pi-cli-skill-trigger
Mar 26, 2026
Merged

fix(core): extract pi-cli tool calls from streaming events#782
christso merged 5 commits intomainfrom
fix/780-pi-cli-skill-trigger

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Mar 26, 2026

Summary

  • Pi CLI emits tool_execution_start/tool_execution_end events in its JSONL output, but the provider only extracted tool calls from message content arrays (tool_use/toolCall format). This caused the skill-trigger evaluator to report Skill "X" not found in N tool call(s) even when pi successfully loaded the skill.
  • extractMessages() now also scans for tool_execution_start/tool_execution_end events and injects reconstructed ToolCall objects into assistant messages, deduplicating against any tool calls already present in message content.
  • Also handles tool_call (snake_case) content type variant and logs tool_start/tool_end events in the stream logger for better debugging.

Closes #780

Test plan

  • New unit tests for extractToolCallsFromEvents and extractMessages covering: event-only tool calls, deduplication, multiple tools, synthetic assistant message creation, turn_end fallback, and tool_call content type
  • Existing skill-trigger evaluator tests pass (27 tests)
  • Full test suite passes (1193 core + 63 eval tests)
  • Build + typecheck + lint all pass
  • Manual verification with a real pi-cli eval run containing skill files — pi v0.58.1 with openai/gpt-5.1-codex via OpenRouter. Skill-trigger correctly detects read tool loading .agents/skills/csv-analyzer/SKILL.md (score 1.000). Confirmed tool_execution_start/tool_execution_end events present in JSONL and new tool_start/tool_end log lines visible.

🤖 Generated with Claude Code

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 26, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: a9f27f5
Status:⚡️  Build in progress...

View logs

christso and others added 5 commits March 26, 2026 08:47
…trigger

Pi CLI emits tool_execution_start/end events in JSONL output, but the
provider only extracted tool calls from message content arrays. This
caused the skill-trigger evaluator to miss pi's skill file reads.

Now extractMessages() also scans for tool_execution_start/end events
and injects reconstructed tool calls into assistant messages. Also
handles tool_call (snake_case) content type variant.

Closes #780

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace target message with a new object instead of casting to bypass
readonly constraint, per code review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Re-adds the skill-trigger assertion that was removed as a workaround
for #780. Now that pi-cli tool call extraction is fixed, the evaluator
can detect when pi loads the agent-plugin-review skill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pi-cli target needs subprovider/model/api_key to produce meaningful
output. Without them, pi uses its default which returns empty responses.

Also removes workers: 1 from agent-plugin-review eval since all test
cases are read-only reviews that can safely run in parallel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@christso christso force-pushed the fix/780-pi-cli-skill-trigger branch from 2989cbd to a9f27f5 Compare March 26, 2026 08:49
@christso christso merged commit 6d8f631 into main Mar 26, 2026
1 check was pending
@christso christso deleted the fix/780-pi-cli-skill-trigger branch March 26, 2026 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

skill-trigger evaluator cannot detect pi-cli skill loading

1 participant