Skip to content

[Experimental] change inference#19

Open
josephgoksu wants to merge 1 commit intomainfrom
refactor/change-inference
Open

[Experimental] change inference#19
josephgoksu wants to merge 1 commit intomainfrom
refactor/change-inference

Conversation

@josephgoksu
Copy link
Owner

No description provided.

@greptile-apps
Copy link

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This is a large experimental refactoring that replaces direct LLM API calls (Eino framework, TEI embedder/reranker) with subprocess invocations of installed AI CLI tools (Claude Code, Gemini CLI, Codex CLI). The architecture shift means users no longer need to configure API keys separately — TaskWing orchestrates whichever AI CLI is already on PATH. The PR also introduces a compress package for token-efficient output filtering, new commands (execute, gain, proxy, hook, init), a centralized icon registry, improved bootstrap UX, and a runner progress UI.

Key changes:

  • New internal/runner/ package: Runner interface with CLIType-dispatched implementations for Claude, Gemini, and Codex; parallel Pool.Execute and sequential Pool.ExecuteSequential; exponential-backoff RetryableInvoke; structured JSON extraction via result.Decode().
  • TEI layer removed: tei_embedder.go, tei_reranker.go, and all reranking logic in knowledge/service.go and knowledge/reranker.go deleted; knowledge retrieval simplified to single-stage.
  • Eino-based agents deleted: analysis_code.go, utility_agents.go, watch_activity.go, watch_agent.go removed; replaced by runner-backed adapters in runner/planning.go.
  • New internal/compress/ package: Pipeline-based output compression (ANSI stripping, deduplication, diff context collapsing, test failure extraction) to reduce token usage in the proxy/gain flow.
  • Issues found: CollapseDiffContext in compress/git.go has a logic bug where "last 3" context lines before each change are never emitted and the collapse count message is incorrect; Pool.ExecuteSequential leaves unexecuted job slots as zero-value JobResult{} (nil error, nil result) which can mislead callers; codex_stream.go does not check scanner.Err() after the scan loop; retry.go's isRetryable matches "exit status" and "signal" which causes permanent failures (bad args, auth errors) to be retried the full retry budget; sqlite.go's UpsertNodeBySummary does not close the rows cursor on the similarity-match early-return path.

Confidence Score: 3/5

  • This PR is risky to merge as-is due to the logic bug in CollapseDiffContext and the misleading zero-value results in ExecuteSequential; the rest of the changes are structurally sound but experimental.
  • Score reflects a well-intentioned architectural shift with good structure overall, but held back by: (1) a confirmed logic bug in the new compress/git.go that produces wrong collapse counts and drops context lines, (2) a subtle API contract issue in Pool.ExecuteSequential that could cause silent data loss in callers, and (3) an overly broad retry condition that wastes time retrying permanent failures. The TEI removal and agent simplification are clean. The runner abstraction is solid.
  • Pay close attention to internal/compress/git.go (CollapseDiffContext logic), internal/runner/pool.go (ExecuteSequential zero-value results), and internal/runner/retry.go (isRetryable breadth).

Important Files Changed

Filename Overview
internal/runner/runner.go New core abstraction: defines the Runner interface, CLIType enum, InvokeRequest/ProgressEvent types, and factory functions for spawning Claude Code, Gemini CLI, and Codex CLI as headless subprocesses. Well-structured with clear separation of concerns.
internal/runner/pool.go Introduces parallel and sequential job execution over runner pools. ExecuteSequential has a logic issue: unexecuted jobs after an early break are left as zero-value JobResult{} with nil Error, potentially misleading callers into thinking those jobs succeeded.
internal/runner/retry.go Adds exponential backoff retry logic for runner invocations. The isRetryable check is overly broad — matching "exit status" and "signal" will retry permanent failures like missing binaries or auth errors, wasting significant time.
internal/runner/result.go Defines InvokeResult with JSON extraction strategies (full parse → markdown block → brace matching). Claude envelope unwrapping is correctly implemented. Contains data model definitions for bootstrap analysis, plan, and execute outputs.
internal/runner/codex_stream.go Implements JSONL streaming parser for Codex CLI's exec --json mode. Missing scanner.Err() check after the scan loop — I/O errors would be silently swallowed, potentially returning incomplete output with no error indication.
internal/compress/git.go New git diff compression pipeline. CollapseDiffContext has a logic bug: the "last 3 context lines" before each change are never buffered or emitted, and the collapse count message reports contextRun-6 but actually contextRun-3 lines are dropped.
internal/compress/pipeline.go Clean pipeline abstraction with Filter type, Pipeline.Run(), Stats, and Compress() helper. Well-designed and straightforward.
internal/memory/sqlite.go UpsertNodeBySummary return type changed to (string, error). The similarity-match early-return path commits the transaction but does not close the rows cursor before returning, which can block SQLite WAL checkpointing.
internal/knowledge/service.go Simplification of two-stage retrieval by removing reranking (TEI layer removed). response.Timings["rerank"] is now hardcoded to int64(0) instead of being removed, which is misleading in debug output.
internal/runner/planning.go Implements RunnerClarifier, RunnerPlanner, RunnerDecomposer, and RunnerExpander adapters that back the app.* interfaces using the AI CLI runner. Clean adapter pattern with appropriate error wrapping.
internal/ui/output.go New UI output helpers: PrintSuccess/Warning/Error/Hint, phase headers, result lines, bootstrap welcome/summary panels, and plan boxes. Consistent use of lipgloss styling throughout.
cmd/bootstrap.go Bootstrap command refactored to auto-detect AI CLIs and enable analysis automatically. --skip-analyze replaced with --no-analyze/--analyze. bootstrapContext accumulates phase stats for the final summary panel. Cleaner UX overall.

Sequence Diagram

sequenceDiagram
    participant User
    participant Bootstrap/Plan as Bootstrap / Plan cmd
    participant Runner as runner.Pool
    participant Claude as claudeRunner
    participant Gemini as geminiRunner
    participant Codex as codexRunner
    participant Knowledge as knowledge.Service
    participant SQLite as memory.SQLiteStore

    User->>Bootstrap/Plan: taskwing bootstrap / plan
    Bootstrap/Plan->>Runner: DetectAndCreateRunners()
    Runner-->>Bootstrap/Plan: []Runner (Claude > Gemini > Codex priority)

    Bootstrap/Plan->>Runner: Pool.Execute(jobs) [parallel analysis]
    par Focus area 1
        Runner->>Claude: Invoke(BootstrapAnalysisPrompt)
        Claude-->>Runner: InvokeResult (JSON)
    and Focus area 2
        Runner->>Gemini: Invoke(BootstrapAnalysisPrompt)
        Gemini-->>Runner: InvokeResult (JSON)
    and Focus area 3
        Runner->>Codex: Invoke(BootstrapAnalysisPrompt)
        Codex-->>Runner: InvokeResult (JSON)
    end

    Runner-->>Bootstrap/Plan: []JobResult

    Bootstrap/Plan->>Bootstrap/Plan: result.Decode() → BootstrapAnalysis
    Bootstrap/Plan->>Knowledge: AddNode(finding) per result
    Knowledge->>SQLite: UpsertNodeBySummary(node)
    SQLite-->>Knowledge: (nodeID, error)

    Bootstrap/Plan->>Runner: Pool.ExecuteSequential(taskJobs) [task execution]
    Runner->>Claude: InvokeWithFiles(TaskExecutionPrompt)
    Claude-->>Runner: InvokeResult
    Runner-->>Bootstrap/Plan: []JobResult

    Bootstrap/Plan-->>User: Summary panel (findings, symbols, duration)
Loading

Comments Outside Diff (2)

  1. internal/compress/git.go, line 67-101 (link)

    CollapseDiffContext collapse count and missing "last 3" lines

    The comment says "Keep first 3 and last 3 context lines, collapse middle", but the implementation is wrong in two ways:

    1. Lines 4+ in a context run are dropped and never buffered, so the 3 lines immediately before the next change are never emitted.
    2. The collapsed count message is contextRun - 6, but since only the first 3 lines are actually emitted (not 6), the real number of dropped lines is contextRun - 3.

    Example: 10 unchanged context lines followed by a + line

    • Lines 1–3 → added to result
    • Lines 4–10 → silently dropped (7 lines lost)
    • Message printed: ... (4 unchanged lines) (10−6=4, but 7 were dropped)

    The correct approach is to buffer lines 4+ and flush the last 3 of the buffer when a change is encountered:

    } else {
        contextBuf = append(contextBuf, line)
        if len(contextBuf) > 3 {
            // keep only last 3
            contextBuf = contextBuf[len(contextBuf)-3:]
        }
    }

    And when a change is encountered, emit the buffer contents (after the collapse summary).

  2. internal/memory/sqlite.go, line 1411-1414 (link)

    rows cursor not closed on early return from similarity-match branch

    When a semantically similar node is found and the transaction is committed, the function returns immediately without closing rows. The deferred rollbackWithLog will attempt to roll back an already-committed transaction (a no-op) but the cursor stays open.

    In SQLite's WAL mode (the common default for go-sqlite3), an unclosed read cursor can block checkpointing and eventually starve writers.

Last reviewed commit: a91bd2c

Comment on lines +80 to +113
}

// ExecuteSequential runs jobs one at a time using the first runner.
// Use this for file-modifying operations (task execution) where order matters.
func (p *Pool) ExecuteSequential(ctx context.Context, jobs []Job) []JobResult {
if len(p.runners) == 0 {
return []JobResult{{Error: fmt.Errorf("no runners available")}}
}

runner := p.runners[0]
results := make([]JobResult, len(jobs))

for i, job := range jobs {
if ctx.Err() != nil {
results[i] = JobResult{
JobID: job.ID,
Error: ctx.Err(),
}
break
}

result, err := runner.InvokeWithFiles(ctx, job.Request)
results[i] = JobResult{
JobID: job.ID,
Result: result,
Error: err,
RunnerType: runner.Type(),
}

// Stop on failure
if err != nil {
break
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero-value results for unexecuted jobs are misleading

When ExecuteSequential breaks early — either due to context cancellation or a job error — the remaining entries in results are left as zero-value JobResult{} values: nil error, nil result, and an empty JobID. Callers that iterate over all returned results and check only result.Error != nil will silently treat those unexecuted jobs as succeeded.

Consider explicitly marking unexecuted entries, e.g.:

// After the loop, fill in unexecuted jobs
for j := i + 1; j < len(jobs); j++ {
    results[j] = JobResult{
        JobID: jobs[j].ID,
        Error: fmt.Errorf("job not executed: previous job failed"),
    }
}

Or at minimum document that callers must check for an empty JobID to detect un-run jobs.

Comment on lines +101 to +103
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scanner error is never checked

After the scan loop exits, scanner.Err() is never inspected. If the process's stdout pipe closes with an I/O error (e.g., the subprocess was killed), the error is silently swallowed and lastAssistantText may be incomplete with no indication of failure.

Suggested change
}
}
}
if err := scanner.Err(); err != nil {
return nil, fmt.Errorf("codex stdout scan: %w", err)
}
waitErr := cmd.Wait()

Comment on lines +56 to +67
// isRetryable returns true for errors that warrant a retry.
func isRetryable(err error) bool {
if err == nil {
return false
}
msg := err.Error()
// Retry on timeouts, CLI failures, and JSON parse errors
for _, keyword := range []string{"timeout", "timed out", "JSON", "json", "exit status", "signal"} {
if strings.Contains(msg, keyword) {
return true
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly broad retry keywords catch permanent failures

"exit status" and "signal" are matched as retryable, but virtually every subprocess failure wraps one of those phrases (e.g., "claude invocation failed (exit 1)", "exit status 127" for a missing binary, or "signal: killed" after a manual kill). This means auth failures, invalid arguments, missing binaries, and out-of-memory kills are all retried up to defaultRetries (4) times before finally failing — wasting minutes of wall-clock time.

Consider narrowing the set to genuinely transient signals/codes, or at minimum removing "exit status" and keeping only specific known-transient patterns:

Suggested change
// isRetryable returns true for errors that warrant a retry.
func isRetryable(err error) bool {
if err == nil {
return false
}
msg := err.Error()
// Retry on timeouts, CLI failures, and JSON parse errors
for _, keyword := range []string{"timeout", "timed out", "JSON", "json", "exit status", "signal"} {
if strings.Contains(msg, keyword) {
return true
}
}
for _, keyword := range []string{"timeout", "timed out", "connection refused", "connection reset"} {
if strings.Contains(msg, keyword) {
return true
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant