Skip to content

feat: add QwenPaw agent support#694

Open
gkld wants to merge 3 commits into
kenn-io:mainfrom
gkld:feat/qwenpaw-agent
Open

feat: add QwenPaw agent support#694
gkld wants to merge 3 commits into
kenn-io:mainfrom
gkld:feat/qwenpaw-agent

Conversation

@gkld

@gkld gkld commented Jun 15, 2026

Copy link
Copy Markdown

Adds the QwenPaw coding agent to the registry and sync pipeline.

QwenPaw stores daily conversation transcripts as
<workspace>/dialog/<YYYY-MM-DD>.jsonl under ~/.copaw/workspaces/.
Each runtime hosts multiple agent workspaces (default,
fund_manager, note_keeper, researcher, ...), and each workspace
logs one JSONL file per active day.

The parser handles Anthropic-style content blocks (text, thinking,
tool_use, tool_result) with one QwenPaw quirk: tool results live in
role: "system" messages rather than user messages. Those map to
RoleUser + IsSystem so they remain distinguishable from real user
turns without inflating UserMessageCount. Timestamps use the
"YYYY-MM-DD HH:MM:SS.fff" local-time format and are parsed via
time.ParseInLocation(..., time.Local), mirroring the Hermes parser.

Raw session IDs use the form <workspace>:<date>, yielding full IDs
like qwenpaw:default:2026-04-19. Discovery walks
<root>/<workspace>/dialog/*.jsonl; the watcher path classifier and
project extraction in internal/sync/engine.go were extended to match.

Scope / non-goals:

  • Only dialog/*.jsonl is parsed. The sessions/*.json agent-memory
    snapshots overlap with dialog content and are skipped for now.
  • inbox_traces/*.json (cron run traces) and inbox_events.json are
    notification-shaped, not conversations, and are out of scope.
  • Token usage aggregation from token_usage.json is not wired up; it
    is keyed by day/model rather than by session.

Reviewers should look at:

  • internal/parser/qwenpaw.go — discovery, source resolution, parse
    loop, role mapping, timestamp parsing.
  • internal/parser/qwenpaw_test.go — table-driven coverage of the
    happy path plus malformed lines, empty content, missing timestamps,
    multiple tool_use blocks, and the system-role tool_result mapping.
  • internal/sync/engine.go — four insertion points (watcher path
    classification, process dispatch switch, new processQwenPaw,
    project extraction case).
  • internal/parser/types.go — new AgentQwenPaw constant and
    registry entry (EnvVar: QWENPAW_DIR, DefaultDirs: [".copaw/workspaces"], IDPrefix: "qwenpaw:").
  • frontend/src/lib/utils/agents.ts — label/color entry (cyan,
    matching Qwen Code; can be re-tinted separately if a distinct
    visual identity is desired).

Adds the QwenPaw coding agent to the registry and sync pipeline.
QwenPaw stores daily conversation transcripts as
<workspace>/dialog/<YYYY-MM-DD>.jsonl under ~/.copaw/workspaces/.

The parser handles Anthropic-style content blocks (text, thinking,
tool_use, tool_result) with system-role carriers for tool results,
which map to RoleUser + IsSystem so they stay distinguishable from
real user turns without inflating UserMessageCount. Timestamps use
the "YYYY-MM-DD HH:MM:SS.fff" local-time format.

Raw session IDs use the form "<workspace>:<date>", yielding full
IDs like "qwenpaw:default:2026-04-19".
@roborev-ci

roborev-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

roborev: Combined Review (0d11c0f)

Summary verdict: one medium parser robustness issue should be fixed before merging.

Medium

  • Location: internal/parser/qwenpaw.go:122
  • Problem: The QwenPaw parser caps JSONL lines at 8 MiB and returns an error when a larger tool-result line is encountered, which can prevent syncing the whole QwenPaw session. Other JSONL parsers in this repo use the shared 64 MiB maxLineSize/lineReader path to tolerate large local agent outputs.
  • Fix: Use newLineReader(f, maxLineSize), or at least raise the scanner limit to the shared maximum and handle oversized lines consistently.

Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 6m50s), codex_security (codex/security, done, 2m18s) | Total: 9m16s

Switches the QwenPaw JSONL loop from a bufio.Scanner with an 8 MiB
cap to newLineReader(f, maxLineSize), matching the other JSONL
parsers in this package.

Two behavior improvements:

- The line cap rises from 8 MiB to the shared 64 MiB maximum, so
  legitimate large tool-result lines no longer abort the parse.
- Oversized lines are silently skipped (per the lineReader contract)
  instead of failing the whole session; the surrounding messages
  still sync.

Adds TestParseQwenPawSession_OversizedLineSkipped to lock in the
new behavior.
@roborev-ci

roborev-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

roborev: Combined Review (3214b68)

Summary: Medium issues remain in the QwenPaw parser; no exploitable security issues were reported.

Medium

  • internal/parser/qwenpaw.go:315: FileInfo.Mtime is stored in seconds, while the sync engine and DB expect nanoseconds. After the first sync, unchanged QwenPaw files will not match shouldSkipByPath, causing repeated reparses/upserts and local-modified churn.

    • Fix: Use info.ModTime().UnixNano() and add coverage for the stored mtime unit.
  • internal/parser/qwenpaw.go:232: Tool results omit ContentLength, so paired tool calls persist result_content_length as zero. For default-blocked categories like Read, this loses the only retained result-size signal.

    • Fix: Populate ContentLength from block.Output using the existing tool-result length helper and assert it in the QwenPaw tool-result test.

Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 8m56s), codex_security (codex/security, done, 2m35s) | Total: 11m40s

Two review-driven fixes on the QwenPaw parser:

- FileInfo.Mtime now uses info.ModTime().UnixNano() instead of
  .Unix(). The sync engine compares File.Mtime against os.Stat in
  nanoseconds, so the previous seconds-precision value would never
  match shouldSkipByPath after the first sync, forcing every QwenPaw
  file to reparse on every pass.

- ParsedToolResult.ContentLength is now derived from block.Output via
  the shared toolResultContentLength helper. Without it, default-
  blocked categories like Read lost their only retained result-size
  signal (result_content_length persisted as zero).

Adds TestParseQwenPawSession_FileMtimeIsNanoseconds and extends
TestParseQwenPawSession_ToolUseAndResult to assert ContentLength on
the paired tool result.
@roborev-ci

roborev-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

roborev: Combined Review (7b6c89a)

No Medium, High, or Critical findings were reported.

The security review found no issues. The remaining findings were Low severity and are omitted per the review-combination rules.


Panel: ci_default_security | Synthesis: codex, 5s | Members: codex_default (codex/default, done, 7m50s), codex_security (codex/security, done, 1m34s) | Total: 9m29s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant