fix(bootstrap): resolve 9 bootstrap defects with 50+ regression tests by josephgoksu · Pull Request #20 · josephgoksu/TaskWing

josephgoksu · 2026-03-07T20:13:46Z

Summary

RootPath→HOME: GetMemoryBasePath rejects MarkerNone contexts; Detect() skips .taskwing markers above multi-repo workspaces
FK constraint failures: LinkNodes pre-checks node existence before INSERT (avoids SQLite error 787)
IsMonorepo misclassification: Detect() checks hasNestedProjects() in MarkerNone fallback, resolving disagreement with DetectWorkspace()
Zero docs loaded: Added LoadForServices to DocLoader; wired into RunDeterministicBootstrap via workspace auto-detection
Hallucinated findings: Gate 3 enforcement — findings without evidence start as "skipped"; added HasEvidence() and NeedsHumanVerification()
Claude MCP drift: Filesystem-based drift detection with consent enforcement for global mutations
Git exit 128: IsGitRepository() pre-check before running git log
Gemini MCP install: installGeminiCLI now returns errors properly
CI: Added bootstrap-integration job with failfast regression suite

Test plan

go test ./... -run TestBootstrapRepro -v — all 9 defect repro tests pass
go test ./... -run TestKnowledgeLinking_NoFK -v — FK constraint tests pass
go test ./... -run TestRootPathResolution -v — path resolution tests pass
go test ./... -run TestIsMonorepoDetection -v — workspace detection tests pass
go test ./... -run TestDocIngestion -v — doc ingestion tests pass
go test ./... -run TestClaudeDriftDetection -v — MCP drift tests pass
go test ./... -run TestGate3_Enforcement -v — hallucination gate tests pass
CI bootstrap-integration job passes

Greptile Summary

This PR fixes 9 bootstrap defects spanning path resolution, FK integrity, monorepo detection, doc ingestion, MCP drift, hallucination gating, and git error handling — backed by 50+ regression tests. The implementation is well-intentioned but two critical logic gaps remain that silently undermine headline defect fixes:

Key changes:

GetMemoryBasePath rejects MarkerNone contexts to prevent accidental writes to ~/.taskwing/memory.db ✓
LinkNodes wraps existence check + INSERT in transaction (addresses TOCTOU concern), but SELECT COUNT(*) FROM nodes WHERE id IN (?, ?) fails for self-referential edges due to SQL deduplication — returns 1 instead of 2 when from == to ✗
Gate 3 enforcement (NewFindingWithEvidence) sets VerificationStatus = Skipped when len(convertedEvidence) == 0, but ConvertEvidence does not filter empty FilePath entries. An LLM response with [{"file_path": ""}] bypasses the gate entirely, setting status to Pending instead of Skipped ✗
Detect() now checks hasNestedProjects() in MarkerNone fallback to resolve IsMonorepo misclassification ✓
DocLoader.LoadForServices added and wired into bootstrap for multi-repo doc discovery ✓
installGeminiCLI now returns errors instead of silently dropping them ✓
New bootstrap-integration CI job runs the full regression suite with -failfast, but lacks timeout-minutes guard ✗

Blocking issues: Merge should be held until the two logic bugs in LinkNodes (self-link rejection) and NewFindingWithEvidence (Gate 3 bypass) are fixed. Both allow valid data to be silently rejected or hallucinated data to slip through, directly contradicting the PR's stated defect resolutions.

Confidence Score: 2/5

Two critical logic bugs block merge: self-link edges always rejected due to SQL deduplication, and Gate 3 enforcement bypassed by empty-FilePath evidence.
The majority of fixes are well-targeted and correct (path rejection, git handling, workspace detection, doc loading, MCP drift detection), but two distinct logic bugs in core bootstrap paths directly contradict headline defect resolutions: (1) LinkNodes rejects all self-referential edges due to IN clause deduplication, silently failing valid data; (2) NewFindingWithEvidence allows hallucinated evidence with empty file_path to bypass Gate 3 and be set as Pending instead of Skipped. Both are low-surface-area but high-impact correctness failures. The CI timeout omission is secondary but prevents runaway CI hangs.
internal/memory/sqlite.go (self-link edge rejection), internal/agents/core/parsers.go (Gate 3 bypass with empty FilePath), .github/workflows/ci.yml (missing timeout)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[RunDeterministicBootstrap] --> B[DetectWorkspace]
    B --> C{containsDot services?}
    C -->|No sub-repos with dot| D[LoadForServices]
    C -->|Root service present| E[Load root only]
    D --> F[Aggregate docs from sub-repos]
    E --> G[Root docs only]
    F --> H[Bootstrap continues]
    G --> H

    H --> I[Detect project context]
    I --> J{MarkerType?}
    J -->|MarkerNone| K[GetMemoryBasePath REJECTS]
    J -->|MarkerGit/TaskWing/Lang| L[GetMemoryBasePath OK]

    L --> M[LinkNodes]
    M --> N{SELECT COUNT id IN from,to}
    N -->|from == to: count=1| O["❌ BUG: Erroneously rejected"]
    N -->|count >= 2| P[INSERT edge in tx]

    H --> Q[NewFindingWithEvidence]
    Q --> R{len convertedEvidence == 0?}
    R -->|true| S[VerificationStatus = Skipped]
    R -->|false - even with empty FilePath| T["⚠️ BUG: VerificationStatus = Pending"]
    T --> U{HasEvidence checks FilePath != empty}
    U -->|FilePath empty| V["❌ Pending but HasEvidence=false"]
    U -->|FilePath valid| W[Auto-linkable finding]

Comments Outside Diff (3)

cmd/mcp_install.go, line 126-131 (link)

Verbose mode loses real-time stderr

cmd.Stderr is now unconditionally set to &stderrBuf before the verbose check, meaning verbose mode users no longer see stderr output from the gemini command in real time. The old code explicitly wired cmd.Stderr = os.Stderr under verbose. Now stderr is only surfaced when the command fails — a silent regression for diagnostics.

Prompt To Fix With AI

This is a comment left during a code review.
Path: cmd/mcp_install.go
Line: 126-131

Comment:
**Verbose mode loses real-time stderr**

`cmd.Stderr` is now unconditionally set to `&stderrBuf` before the `verbose` check, meaning verbose mode users no longer see stderr output from the `gemini` command in real time. The old code explicitly wired `cmd.Stderr = os.Stderr` under verbose. Now stderr is only surfaced when the command fails — a silent regression for diagnostics.



How can I resolve this? If you propose a fix, please make it concise.

internal/bootstrap/bootstrap_repro_test.go, line 1168-1179 (link)

Test passes even when the FK-fix regression is present

When err == nil (the pre-fix behaviour — INSERT OR IGNORE swallows FK) AND len(postEdges) == 0 (no edge created, silent data loss), the test calls t.Log(...) instead of t.Error(...). This means a regression that reintroduces the silent-swallow bug would leave this test green.

Prompt To Fix With AI

This is a comment left during a code review.
Path: internal/bootstrap/bootstrap_repro_test.go
Line: 1168-1179

Comment:
**Test passes even when the FK-fix regression is present**

When `err == nil` (the pre-fix behaviour — INSERT OR IGNORE swallows FK) AND `len(postEdges) == 0` (no edge created, silent data loss), the test calls `t.Log(...)` instead of `t.Error(...)`. This means a regression that reintroduces the silent-swallow bug would leave this test green.



How can I resolve this? If you propose a fix, please make it concise.

internal/agents/core/types.go, line 430-432 (link)

Magic confidence threshold — extract as named constant

The 0.5 threshold is the only knob between "auto-apply" and "human verification" for all findings. Embedding it as a literal makes it invisible to future readers and forces a global search-and-replace if the policy changes.
Prompt To Fix With AI
```
This is a comment left during a code review.
Path: internal/agents/core/types.go
Line: 430-432

Comment:
**Magic confidence threshold — extract as named constant**

The `0.5` threshold is the only knob between "auto-apply" and "human verification" for all findings. Embedding it as a literal makes it invisible to future readers and forces a global search-and-replace if the policy changes.



How can I resolve this? If you propose a fix, please make it concise.
```
_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{Last reviewed commit: b7cc06f}

Greptile also left 3 inline comments on this PR.

Context used:

Rule used - CLAUDE.md (source)

- Fix RootPath resolving to HOME when ~/.taskwing exists (reject MarkerNone contexts, skip .taskwing above multi-repo workspaces) - Fix FK constraint failures in LinkNodes (pre-check node existence) - Fix IsMonorepo misclassification for multi-repo workspaces - Fix zero docs loaded by adding LoadForServices for sub-repo scanning - Fix hallucinated findings with Gate 3 enforcement (skip findings without evidence) - Fix Claude MCP drift detection with evidence traceability - Fix git exit 128 with IsGitRepository pre-check - Fix Gemini MCP install error handling - Add bootstrap-integration CI job with failfast regression suite

internal/memory/sqlite.go

internal/git/git.go

internal/bootstrap/service.go

- Fix verbose mode losing real-time stderr in Gemini CLI install - Wrap LinkNodes SELECT+INSERT in transaction to prevent TOCTOU race - Route IsGitRepository through Client/Commander for testability - Fix fragile first-element guard in service workspace detection - Change t.Log to t.Error for FK regression assertions - Extract magic 0.5 threshold as MinConfidenceForAutoApply constant

josephgoksu · 2026-03-07T20:35:56Z

@greptileai

Run sync-docs.sh to align managed markdown blocks with canonical partials (legal text formatting, table syntax, command list format). Fixes docs-consistency CI check.

greptile-apps · 2026-03-07T20:42:03Z

internal/memory/sqlite.go

+	if qErr := tx.QueryRow(`SELECT COUNT(*) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil {
+		return fmt.Errorf("check node existence: %w", qErr)
+	}
+	if existCount < 2 {
+		return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to)


Self-referential edges always rejected due to SQL IN deduplication

When from == to, the query SELECT COUNT(*) FROM nodes WHERE id IN (?, ?) evaluates as WHERE id IN (x, x). SQL deduplicates identical values in an IN clause, so COUNT(*) returns 1, not 2. This causes every valid self-referential edge to be silently rejected with "link skipped" even when the node exists.

The check at line 1478 (if existCount < 2) will always fail for self-links, blocking valid use cases like tagging a node with itself or modeling recursive relationships.

Fix — use COUNT(DISTINCT id) instead:

Suggested change

if qErr := tx.QueryRow(`SELECT COUNT(*) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil {

return fmt.Errorf("check node existence: %w", qErr)

}

if existCount < 2 {

return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to)

var existCount int

if qErr := tx.QueryRow(`SELECT COUNT(DISTINCT id) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil {

return fmt.Errorf("check node existence: %w", qErr)

}

if existCount < 2 {

return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to)

}

Alternatively, guard from == to as a single-node existence check before applying the two-node logic.

Prompt To Fix With AI

This is a comment left during a code review. Path: internal/memory/sqlite.go Line: 1475-1479 Comment: **Self-referential edges always rejected due to SQL `IN` deduplication** When `from == to`, the query `SELECT COUNT(*) FROM nodes WHERE id IN (?, ?)` evaluates as `WHERE id IN (x, x)`. SQL deduplicates identical values in an `IN` clause, so `COUNT(*)` returns 1, not 2. This causes every valid self-referential edge to be silently rejected with "link skipped" even when the node exists. The check at line 1478 (`if existCount < 2`) will always fail for self-links, blocking valid use cases like tagging a node with itself or modeling recursive relationships. Fix — use `COUNT(DISTINCT id)` instead: ```suggestion var existCount int if qErr := tx.QueryRow(`SELECT COUNT(DISTINCT id) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil { return fmt.Errorf("check node existence: %w", qErr) } if existCount < 2 { return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to) } ``` Alternatively, guard `from == to` as a single-node existence check before applying the two-node logic. How can I resolve this? If you propose a fix, please make it concise.

Fixed in 8255ebe — uses expectedCount=1 when from==to to handle IN deduplication.

greptile-apps · 2026-03-07T20:42:04Z

internal/agents/core/parsers.go

+	convertedEvidence := ConvertEvidence(evidence)
+
+	// Gate 3: Findings without evidence start as "skipped" rather than "pending"
+	// to prevent them from being auto-linked into the knowledge graph.
+	verificationStatus := VerificationStatusPending
+	if len(convertedEvidence) == 0 {
+		verificationStatus = VerificationStatusSkipped
+	}


Gate 3 bypass: empty-FilePath evidence slips through as Pending

The gate check at line 67 uses len(convertedEvidence) == 0 as the sole criterion for flagging as Skipped. However, ConvertEvidence() (line 26–34) performs a direct struct copy without filtering:

evidence[i] = Evidence(e) // empty FilePath preserved as-is

This creates a divergence with HasEvidence() in types.go (line 65–71), which explicitly requires FilePath != "". An LLM response with [{"file_path": "", "snippet": "..."}] will:

Pass the gate: len(convertedEvidence) == 1 → VerificationStatus = Pending

Fail HasEvidence(): empty FilePath → returns false

This allows hallucinated findings with empty file paths to be auto-linked into the knowledge graph, defeating the Gate 3 enforcement intended to prevent exactly this.

Fix — align the gate with HasEvidence() by checking that evidence entries have non-empty FilePath:

Suggested change

convertedEvidence := ConvertEvidence(evidence)

// Gate 3: Findings without evidence start as "skipped" rather than "pending"

// to prevent them from being auto-linked into the knowledge graph.

verificationStatus := VerificationStatusPending

if len(convertedEvidence) == 0 {

verificationStatus = VerificationStatusSkipped

}

convertedEvidence := ConvertEvidence(evidence)

// Gate 3: Findings without verifiable evidence (non-empty FilePath) start as

// "skipped" rather than "pending" to prevent hallucinated findings from being

// auto-linked into the knowledge graph.

verificationStatus := VerificationStatusPending

tempFinding := Finding{Evidence: convertedEvidence}

if !tempFinding.HasEvidence() {

verificationStatus = VerificationStatusSkipped

}

Prompt To Fix With AI

This is a comment left during a code review. Path: internal/agents/core/parsers.go Line: 62-69 Comment: **Gate 3 bypass: empty-FilePath evidence slips through as Pending** The gate check at line 67 uses `len(convertedEvidence) == 0` as the sole criterion for flagging as `Skipped`. However, `ConvertEvidence()` (line 26–34) performs a direct struct copy without filtering: ```go evidence[i] = Evidence(e) // empty FilePath preserved as-is ``` This creates a divergence with `HasEvidence()` in types.go (line 65–71), which explicitly requires `FilePath != ""`. An LLM response with `[{"file_path": "", "snippet": "..."}]` will: - Pass the gate: `len(convertedEvidence) == 1` → `VerificationStatus = Pending` - Fail `HasEvidence()`: empty `FilePath` → returns `false` This allows hallucinated findings with empty file paths to be auto-linked into the knowledge graph, defeating the Gate 3 enforcement intended to prevent exactly this. Fix — align the gate with `HasEvidence()` by checking that evidence entries have non-empty FilePath: ```suggestion convertedEvidence := ConvertEvidence(evidence) // Gate 3: Findings without verifiable evidence (non-empty FilePath) start as // "skipped" rather than "pending" to prevent hallucinated findings from being // auto-linked into the knowledge graph. verificationStatus := VerificationStatusPending tempFinding := Finding{Evidence: convertedEvidence} if !tempFinding.HasEvidence() { verificationStatus = VerificationStatusSkipped } ``` How can I resolve this? If you propose a fix, please make it concise.

Fixed in 8255ebe — gate now uses HasEvidence() which checks for non-empty FilePath.

greptile-apps · 2026-03-07T20:42:05Z

.github/workflows/ci.yml

+  bootstrap-integration:
+    runs-on: ubuntu-latest


Add timeout-minutes to prevent CI hangs on stuck tests

The bootstrap-integration job runs 50+ tests with filesystem and SQLite operations but has no job-level timeout-minutes guard. If any test deadlocks or blocks indefinitely, the job will consume runner time until GitHub's 6-hour timeout, blocking other CI runs and wasting minutes.

Suggested change

bootstrap-integration:

runs-on: ubuntu-latest

bootstrap-integration:

runs-on: ubuntu-latest

timeout-minutes: 10

The -failfast flag in the test command (line 58) handles fast failure within the binary, but cannot interrupt a process that has already hung.

Prompt To Fix With AI

This is a comment left during a code review. Path: .github/workflows/ci.yml Line: 46-47 Comment: **Add `timeout-minutes` to prevent CI hangs on stuck tests** The `bootstrap-integration` job runs 50+ tests with filesystem and SQLite operations but has no job-level `timeout-minutes` guard. If any test deadlocks or blocks indefinitely, the job will consume runner time until GitHub's 6-hour timeout, blocking other CI runs and wasting minutes. ```suggestion bootstrap-integration: runs-on: ubuntu-latest timeout-minutes: 10 ``` The `-failfast` flag in the test command (line 58) handles fast failure *within* the binary, but cannot interrupt a process that has already hung. How can I resolve this? If you propose a fix, please make it concise.

Fixed in 8255ebe — added timeout-minutes: 10.

- Fix self-referential edge rejection: IN deduplication causes COUNT(*) to return 1 when from==to, now uses expectedCount - Fix Gate 3 bypass: empty-FilePath evidence no longer slips through as Pending; uses HasEvidence() for consistent checking - Add timeout-minutes: 10 to bootstrap-integration CI job

greptile-apps bot reviewed Mar 7, 2026

View reviewed changes

internal/memory/sqlite.go Outdated Show resolved Hide resolved

internal/git/git.go Outdated Show resolved Hide resolved

internal/bootstrap/service.go Show resolved Hide resolved

docs: sync README.md with doc partials

b7cc06f

Run sync-docs.sh to align managed markdown blocks with canonical partials (legal text formatting, table syntax, command list format). Fixes docs-consistency CI check.

greptile-apps bot reviewed Mar 7, 2026

View reviewed changes

josephgoksu merged commit cfb545f into main Mar 7, 2026
9 checks passed

josephgoksu deleted the fix/bootstrap-defects-regression-tests branch March 7, 2026 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bootstrap): resolve 9 bootstrap defects with 50+ regression tests#20

fix(bootstrap): resolve 9 bootstrap defects with 50+ regression tests#20
josephgoksu merged 4 commits intomainfrom
fix/bootstrap-defects-regression-tests

josephgoksu commented Mar 7, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

josephgoksu commented Mar 7, 2026

Uh oh!

greptile-apps bot Mar 7, 2026

Uh oh!

josephgoksu Mar 7, 2026

Uh oh!

greptile-apps bot Mar 7, 2026

Uh oh!

josephgoksu Mar 7, 2026

Uh oh!

greptile-apps bot Mar 7, 2026

Uh oh!

josephgoksu Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

josephgoksu commented Mar 7, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Greptile Summary

Confidence Score: 2/5

Flowchart

Comments Outside Diff (3)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

josephgoksu commented Mar 7, 2026

Uh oh!

greptile-apps bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

josephgoksu Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

josephgoksu Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

josephgoksu Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

josephgoksu commented Mar 7, 2026 •

edited by greptile-apps bot

Loading