Skip to content

fix(bootstrap): resolve 9 bootstrap defects with 50+ regression tests#20

Merged
josephgoksu merged 4 commits intomainfrom
fix/bootstrap-defects-regression-tests
Mar 7, 2026
Merged

fix(bootstrap): resolve 9 bootstrap defects with 50+ regression tests#20
josephgoksu merged 4 commits intomainfrom
fix/bootstrap-defects-regression-tests

Conversation

@josephgoksu
Copy link
Owner

@josephgoksu josephgoksu commented Mar 7, 2026

Summary

  • RootPath→HOME: GetMemoryBasePath rejects MarkerNone contexts; Detect() skips .taskwing markers above multi-repo workspaces
  • FK constraint failures: LinkNodes pre-checks node existence before INSERT (avoids SQLite error 787)
  • IsMonorepo misclassification: Detect() checks hasNestedProjects() in MarkerNone fallback, resolving disagreement with DetectWorkspace()
  • Zero docs loaded: Added LoadForServices to DocLoader; wired into RunDeterministicBootstrap via workspace auto-detection
  • Hallucinated findings: Gate 3 enforcement — findings without evidence start as "skipped"; added HasEvidence() and NeedsHumanVerification()
  • Claude MCP drift: Filesystem-based drift detection with consent enforcement for global mutations
  • Git exit 128: IsGitRepository() pre-check before running git log
  • Gemini MCP install: installGeminiCLI now returns errors properly
  • CI: Added bootstrap-integration job with failfast regression suite

Test plan

  • go test ./... -run TestBootstrapRepro -v — all 9 defect repro tests pass
  • go test ./... -run TestKnowledgeLinking_NoFK -v — FK constraint tests pass
  • go test ./... -run TestRootPathResolution -v — path resolution tests pass
  • go test ./... -run TestIsMonorepoDetection -v — workspace detection tests pass
  • go test ./... -run TestDocIngestion -v — doc ingestion tests pass
  • go test ./... -run TestClaudeDriftDetection -v — MCP drift tests pass
  • go test ./... -run TestGate3_Enforcement -v — hallucination gate tests pass
  • CI bootstrap-integration job passes

Greptile Summary

This PR fixes 9 bootstrap defects spanning path resolution, FK integrity, monorepo detection, doc ingestion, MCP drift, hallucination gating, and git error handling — backed by 50+ regression tests. The implementation is well-intentioned but two critical logic gaps remain that silently undermine headline defect fixes:

Key changes:

  • GetMemoryBasePath rejects MarkerNone contexts to prevent accidental writes to ~/.taskwing/memory.db
  • LinkNodes wraps existence check + INSERT in transaction (addresses TOCTOU concern), but SELECT COUNT(*) FROM nodes WHERE id IN (?, ?) fails for self-referential edges due to SQL deduplication — returns 1 instead of 2 when from == to
  • Gate 3 enforcement (NewFindingWithEvidence) sets VerificationStatus = Skipped when len(convertedEvidence) == 0, but ConvertEvidence does not filter empty FilePath entries. An LLM response with [{"file_path": ""}] bypasses the gate entirely, setting status to Pending instead of Skipped
  • Detect() now checks hasNestedProjects() in MarkerNone fallback to resolve IsMonorepo misclassification ✓
  • DocLoader.LoadForServices added and wired into bootstrap for multi-repo doc discovery ✓
  • installGeminiCLI now returns errors instead of silently dropping them ✓
  • New bootstrap-integration CI job runs the full regression suite with -failfast, but lacks timeout-minutes guard ✗

Blocking issues: Merge should be held until the two logic bugs in LinkNodes (self-link rejection) and NewFindingWithEvidence (Gate 3 bypass) are fixed. Both allow valid data to be silently rejected or hallucinated data to slip through, directly contradicting the PR's stated defect resolutions.

Confidence Score: 2/5

  • Two critical logic bugs block merge: self-link edges always rejected due to SQL deduplication, and Gate 3 enforcement bypassed by empty-FilePath evidence.
  • The majority of fixes are well-targeted and correct (path rejection, git handling, workspace detection, doc loading, MCP drift detection), but two distinct logic bugs in core bootstrap paths directly contradict headline defect resolutions: (1) LinkNodes rejects all self-referential edges due to IN clause deduplication, silently failing valid data; (2) NewFindingWithEvidence allows hallucinated evidence with empty file_path to bypass Gate 3 and be set as Pending instead of Skipped. Both are low-surface-area but high-impact correctness failures. The CI timeout omission is secondary but prevents runaway CI hangs.
  • internal/memory/sqlite.go (self-link edge rejection), internal/agents/core/parsers.go (Gate 3 bypass with empty FilePath), .github/workflows/ci.yml (missing timeout)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[RunDeterministicBootstrap] --> B[DetectWorkspace]
    B --> C{containsDot services?}
    C -->|No sub-repos with dot| D[LoadForServices]
    C -->|Root service present| E[Load root only]
    D --> F[Aggregate docs from sub-repos]
    E --> G[Root docs only]
    F --> H[Bootstrap continues]
    G --> H

    H --> I[Detect project context]
    I --> J{MarkerType?}
    J -->|MarkerNone| K[GetMemoryBasePath REJECTS]
    J -->|MarkerGit/TaskWing/Lang| L[GetMemoryBasePath OK]

    L --> M[LinkNodes]
    M --> N{SELECT COUNT id IN from,to}
    N -->|from == to: count=1| O["❌ BUG: Erroneously rejected"]
    N -->|count >= 2| P[INSERT edge in tx]

    H --> Q[NewFindingWithEvidence]
    Q --> R{len convertedEvidence == 0?}
    R -->|true| S[VerificationStatus = Skipped]
    R -->|false - even with empty FilePath| T["⚠️ BUG: VerificationStatus = Pending"]
    T --> U{HasEvidence checks FilePath != empty}
    U -->|FilePath empty| V["❌ Pending but HasEvidence=false"]
    U -->|FilePath valid| W[Auto-linkable finding]
Loading

Comments Outside Diff (3)

  1. cmd/mcp_install.go, line 126-131 (link)

    Verbose mode loses real-time stderr

    cmd.Stderr is now unconditionally set to &stderrBuf before the verbose check, meaning verbose mode users no longer see stderr output from the gemini command in real time. The old code explicitly wired cmd.Stderr = os.Stderr under verbose. Now stderr is only surfaced when the command fails — a silent regression for diagnostics.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: cmd/mcp_install.go
    Line: 126-131
    
    Comment:
    **Verbose mode loses real-time stderr**
    
    `cmd.Stderr` is now unconditionally set to `&stderrBuf` before the `verbose` check, meaning verbose mode users no longer see stderr output from the `gemini` command in real time. The old code explicitly wired `cmd.Stderr = os.Stderr` under verbose. Now stderr is only surfaced when the command fails — a silent regression for diagnostics.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

  2. internal/bootstrap/bootstrap_repro_test.go, line 1168-1179 (link)

    Test passes even when the FK-fix regression is present

    When err == nil (the pre-fix behaviour — INSERT OR IGNORE swallows FK) AND len(postEdges) == 0 (no edge created, silent data loss), the test calls t.Log(...) instead of t.Error(...). This means a regression that reintroduces the silent-swallow bug would leave this test green.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: internal/bootstrap/bootstrap_repro_test.go
    Line: 1168-1179
    
    Comment:
    **Test passes even when the FK-fix regression is present**
    
    When `err == nil` (the pre-fix behaviour — INSERT OR IGNORE swallows FK) AND `len(postEdges) == 0` (no edge created, silent data loss), the test calls `t.Log(...)` instead of `t.Error(...)`. This means a regression that reintroduces the silent-swallow bug would leave this test green.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

  3. internal/agents/core/types.go, line 430-432 (link)

    Magic confidence threshold — extract as named constant

    The 0.5 threshold is the only knob between "auto-apply" and "human verification" for all findings. Embedding it as a literal makes it invisible to future readers and forces a global search-and-replace if the policy changes.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: internal/agents/core/types.go
    Line: 430-432
    
    Comment:
    **Magic confidence threshold — extract as named constant**
    
    The `0.5` threshold is the only knob between "auto-apply" and "human verification" for all findings. Embedding it as a literal makes it invisible to future readers and forces a global search-and-replace if the policy changes.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    Fix in Claude Code

Fix All in Claude Code

Last reviewed commit: b7cc06f

Greptile also left 3 inline comments on this PR.

Context used:

  • Rule used - CLAUDE.md (source)

- Fix RootPath resolving to HOME when ~/.taskwing exists (reject MarkerNone
  contexts, skip .taskwing above multi-repo workspaces)
- Fix FK constraint failures in LinkNodes (pre-check node existence)
- Fix IsMonorepo misclassification for multi-repo workspaces
- Fix zero docs loaded by adding LoadForServices for sub-repo scanning
- Fix hallucinated findings with Gate 3 enforcement (skip findings without evidence)
- Fix Claude MCP drift detection with evidence traceability
- Fix git exit 128 with IsGitRepository pre-check
- Fix Gemini MCP install error handling
- Add bootstrap-integration CI job with failfast regression suite
- Fix verbose mode losing real-time stderr in Gemini CLI install
- Wrap LinkNodes SELECT+INSERT in transaction to prevent TOCTOU race
- Route IsGitRepository through Client/Commander for testability
- Fix fragile first-element guard in service workspace detection
- Change t.Log to t.Error for FK regression assertions
- Extract magic 0.5 threshold as MinConfidenceForAutoApply constant
@josephgoksu
Copy link
Owner Author

@greptileai

Run sync-docs.sh to align managed markdown blocks with canonical
partials (legal text formatting, table syntax, command list format).
Fixes docs-consistency CI check.
Comment on lines +1475 to +1479
if qErr := tx.QueryRow(`SELECT COUNT(*) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil {
return fmt.Errorf("check node existence: %w", qErr)
}
if existCount < 2 {
return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-referential edges always rejected due to SQL IN deduplication

When from == to, the query SELECT COUNT(*) FROM nodes WHERE id IN (?, ?) evaluates as WHERE id IN (x, x). SQL deduplicates identical values in an IN clause, so COUNT(*) returns 1, not 2. This causes every valid self-referential edge to be silently rejected with "link skipped" even when the node exists.

The check at line 1478 (if existCount < 2) will always fail for self-links, blocking valid use cases like tagging a node with itself or modeling recursive relationships.

Fix — use COUNT(DISTINCT id) instead:

Suggested change
if qErr := tx.QueryRow(`SELECT COUNT(*) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil {
return fmt.Errorf("check node existence: %w", qErr)
}
if existCount < 2 {
return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to)
var existCount int
if qErr := tx.QueryRow(`SELECT COUNT(DISTINCT id) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil {
return fmt.Errorf("check node existence: %w", qErr)
}
if existCount < 2 {
return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to)
}

Alternatively, guard from == to as a single-node existence check before applying the two-node logic.

Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/memory/sqlite.go
Line: 1475-1479

Comment:
**Self-referential edges always rejected due to SQL `IN` deduplication**

When `from == to`, the query `SELECT COUNT(*) FROM nodes WHERE id IN (?, ?)` evaluates as `WHERE id IN (x, x)`. SQL deduplicates identical values in an `IN` clause, so `COUNT(*)` returns 1, not 2. This causes every valid self-referential edge to be silently rejected with "link skipped" even when the node exists.

The check at line 1478 (`if existCount < 2`) will always fail for self-links, blocking valid use cases like tagging a node with itself or modeling recursive relationships.

Fix — use `COUNT(DISTINCT id)` instead:

```suggestion
	var existCount int
	if qErr := tx.QueryRow(`SELECT COUNT(DISTINCT id) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil {
		return fmt.Errorf("check node existence: %w", qErr)
	}
	if existCount < 2 {
		return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to)
	}
```

Alternatively, guard `from == to` as a single-node existence check before applying the two-node logic.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8255ebe — uses expectedCount=1 when from==to to handle IN deduplication.

Comment on lines +62 to +69
convertedEvidence := ConvertEvidence(evidence)

// Gate 3: Findings without evidence start as "skipped" rather than "pending"
// to prevent them from being auto-linked into the knowledge graph.
verificationStatus := VerificationStatusPending
if len(convertedEvidence) == 0 {
verificationStatus = VerificationStatusSkipped
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gate 3 bypass: empty-FilePath evidence slips through as Pending

The gate check at line 67 uses len(convertedEvidence) == 0 as the sole criterion for flagging as Skipped. However, ConvertEvidence() (line 26–34) performs a direct struct copy without filtering:

evidence[i] = Evidence(e) // empty FilePath preserved as-is

This creates a divergence with HasEvidence() in types.go (line 65–71), which explicitly requires FilePath != "". An LLM response with [{"file_path": "", "snippet": "..."}] will:

  • Pass the gate: len(convertedEvidence) == 1VerificationStatus = Pending
  • Fail HasEvidence(): empty FilePath → returns false

This allows hallucinated findings with empty file paths to be auto-linked into the knowledge graph, defeating the Gate 3 enforcement intended to prevent exactly this.

Fix — align the gate with HasEvidence() by checking that evidence entries have non-empty FilePath:

Suggested change
convertedEvidence := ConvertEvidence(evidence)
// Gate 3: Findings without evidence start as "skipped" rather than "pending"
// to prevent them from being auto-linked into the knowledge graph.
verificationStatus := VerificationStatusPending
if len(convertedEvidence) == 0 {
verificationStatus = VerificationStatusSkipped
}
convertedEvidence := ConvertEvidence(evidence)
// Gate 3: Findings without verifiable evidence (non-empty FilePath) start as
// "skipped" rather than "pending" to prevent hallucinated findings from being
// auto-linked into the knowledge graph.
verificationStatus := VerificationStatusPending
tempFinding := Finding{Evidence: convertedEvidence}
if !tempFinding.HasEvidence() {
verificationStatus = VerificationStatusSkipped
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/agents/core/parsers.go
Line: 62-69

Comment:
**Gate 3 bypass: empty-FilePath evidence slips through as Pending**

The gate check at line 67 uses `len(convertedEvidence) == 0` as the sole criterion for flagging as `Skipped`. However, `ConvertEvidence()` (line 26–34) performs a direct struct copy without filtering:

```go
evidence[i] = Evidence(e) // empty FilePath preserved as-is
```

This creates a divergence with `HasEvidence()` in types.go (line 65–71), which explicitly requires `FilePath != ""`. An LLM response with `[{"file_path": "", "snippet": "..."}]` will:
- Pass the gate: `len(convertedEvidence) == 1``VerificationStatus = Pending`
- Fail `HasEvidence()`: empty `FilePath` → returns `false`

This allows hallucinated findings with empty file paths to be auto-linked into the knowledge graph, defeating the Gate 3 enforcement intended to prevent exactly this.

Fix — align the gate with `HasEvidence()` by checking that evidence entries have non-empty FilePath:

```suggestion
	convertedEvidence := ConvertEvidence(evidence)

	// Gate 3: Findings without verifiable evidence (non-empty FilePath) start as
	// "skipped" rather than "pending" to prevent hallucinated findings from being
	// auto-linked into the knowledge graph.
	verificationStatus := VerificationStatusPending
	tempFinding := Finding{Evidence: convertedEvidence}
	if !tempFinding.HasEvidence() {
		verificationStatus = VerificationStatusSkipped
	}
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8255ebe — gate now uses HasEvidence() which checks for non-empty FilePath.

Comment on lines +46 to +47
bootstrap-integration:
runs-on: ubuntu-latest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add timeout-minutes to prevent CI hangs on stuck tests

The bootstrap-integration job runs 50+ tests with filesystem and SQLite operations but has no job-level timeout-minutes guard. If any test deadlocks or blocks indefinitely, the job will consume runner time until GitHub's 6-hour timeout, blocking other CI runs and wasting minutes.

Suggested change
bootstrap-integration:
runs-on: ubuntu-latest
bootstrap-integration:
runs-on: ubuntu-latest
timeout-minutes: 10

The -failfast flag in the test command (line 58) handles fast failure within the binary, but cannot interrupt a process that has already hung.

Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/ci.yml
Line: 46-47

Comment:
**Add `timeout-minutes` to prevent CI hangs on stuck tests**

The `bootstrap-integration` job runs 50+ tests with filesystem and SQLite operations but has no job-level `timeout-minutes` guard. If any test deadlocks or blocks indefinitely, the job will consume runner time until GitHub's 6-hour timeout, blocking other CI runs and wasting minutes.

```suggestion
  bootstrap-integration:
    runs-on: ubuntu-latest
    timeout-minutes: 10
```

The `-failfast` flag in the test command (line 58) handles fast failure *within* the binary, but cannot interrupt a process that has already hung.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8255ebe — added timeout-minutes: 10.

- Fix self-referential edge rejection: IN deduplication causes
  COUNT(*) to return 1 when from==to, now uses expectedCount
- Fix Gate 3 bypass: empty-FilePath evidence no longer slips through
  as Pending; uses HasEvidence() for consistent checking
- Add timeout-minutes: 10 to bootstrap-integration CI job
@josephgoksu josephgoksu merged commit cfb545f into main Mar 7, 2026
9 checks passed
@josephgoksu josephgoksu deleted the fix/bootstrap-defects-regression-tests branch March 7, 2026 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant