fix(bootstrap): resolve 9 bootstrap defects with 50+ regression tests#20
Conversation
- Fix RootPath resolving to HOME when ~/.taskwing exists (reject MarkerNone contexts, skip .taskwing above multi-repo workspaces) - Fix FK constraint failures in LinkNodes (pre-check node existence) - Fix IsMonorepo misclassification for multi-repo workspaces - Fix zero docs loaded by adding LoadForServices for sub-repo scanning - Fix hallucinated findings with Gate 3 enforcement (skip findings without evidence) - Fix Claude MCP drift detection with evidence traceability - Fix git exit 128 with IsGitRepository pre-check - Fix Gemini MCP install error handling - Add bootstrap-integration CI job with failfast regression suite
- Fix verbose mode losing real-time stderr in Gemini CLI install - Wrap LinkNodes SELECT+INSERT in transaction to prevent TOCTOU race - Route IsGitRepository through Client/Commander for testability - Fix fragile first-element guard in service workspace detection - Change t.Log to t.Error for FK regression assertions - Extract magic 0.5 threshold as MinConfidenceForAutoApply constant
Run sync-docs.sh to align managed markdown blocks with canonical partials (legal text formatting, table syntax, command list format). Fixes docs-consistency CI check.
| if qErr := tx.QueryRow(`SELECT COUNT(*) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil { | ||
| return fmt.Errorf("check node existence: %w", qErr) | ||
| } | ||
| if existCount < 2 { | ||
| return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to) |
There was a problem hiding this comment.
Self-referential edges always rejected due to SQL IN deduplication
When from == to, the query SELECT COUNT(*) FROM nodes WHERE id IN (?, ?) evaluates as WHERE id IN (x, x). SQL deduplicates identical values in an IN clause, so COUNT(*) returns 1, not 2. This causes every valid self-referential edge to be silently rejected with "link skipped" even when the node exists.
The check at line 1478 (if existCount < 2) will always fail for self-links, blocking valid use cases like tagging a node with itself or modeling recursive relationships.
Fix — use COUNT(DISTINCT id) instead:
| if qErr := tx.QueryRow(`SELECT COUNT(*) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil { | |
| return fmt.Errorf("check node existence: %w", qErr) | |
| } | |
| if existCount < 2 { | |
| return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to) | |
| var existCount int | |
| if qErr := tx.QueryRow(`SELECT COUNT(DISTINCT id) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil { | |
| return fmt.Errorf("check node existence: %w", qErr) | |
| } | |
| if existCount < 2 { | |
| return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to) | |
| } |
Alternatively, guard from == to as a single-node existence check before applying the two-node logic.
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/memory/sqlite.go
Line: 1475-1479
Comment:
**Self-referential edges always rejected due to SQL `IN` deduplication**
When `from == to`, the query `SELECT COUNT(*) FROM nodes WHERE id IN (?, ?)` evaluates as `WHERE id IN (x, x)`. SQL deduplicates identical values in an `IN` clause, so `COUNT(*)` returns 1, not 2. This causes every valid self-referential edge to be silently rejected with "link skipped" even when the node exists.
The check at line 1478 (`if existCount < 2`) will always fail for self-links, blocking valid use cases like tagging a node with itself or modeling recursive relationships.
Fix — use `COUNT(DISTINCT id)` instead:
```suggestion
var existCount int
if qErr := tx.QueryRow(`SELECT COUNT(DISTINCT id) FROM nodes WHERE id IN (?, ?)`, from, to).Scan(&existCount); qErr != nil {
return fmt.Errorf("check node existence: %w", qErr)
}
if existCount < 2 {
return fmt.Errorf("link skipped: one or both nodes not found (from=%q, to=%q)", from, to)
}
```
Alternatively, guard `from == to` as a single-node existence check before applying the two-node logic.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in 8255ebe — uses expectedCount=1 when from==to to handle IN deduplication.
| convertedEvidence := ConvertEvidence(evidence) | ||
|
|
||
| // Gate 3: Findings without evidence start as "skipped" rather than "pending" | ||
| // to prevent them from being auto-linked into the knowledge graph. | ||
| verificationStatus := VerificationStatusPending | ||
| if len(convertedEvidence) == 0 { | ||
| verificationStatus = VerificationStatusSkipped | ||
| } |
There was a problem hiding this comment.
Gate 3 bypass: empty-FilePath evidence slips through as Pending
The gate check at line 67 uses len(convertedEvidence) == 0 as the sole criterion for flagging as Skipped. However, ConvertEvidence() (line 26–34) performs a direct struct copy without filtering:
evidence[i] = Evidence(e) // empty FilePath preserved as-isThis creates a divergence with HasEvidence() in types.go (line 65–71), which explicitly requires FilePath != "". An LLM response with [{"file_path": "", "snippet": "..."}] will:
- Pass the gate:
len(convertedEvidence) == 1→VerificationStatus = Pending - Fail
HasEvidence(): emptyFilePath→ returnsfalse
This allows hallucinated findings with empty file paths to be auto-linked into the knowledge graph, defeating the Gate 3 enforcement intended to prevent exactly this.
Fix — align the gate with HasEvidence() by checking that evidence entries have non-empty FilePath:
| convertedEvidence := ConvertEvidence(evidence) | |
| // Gate 3: Findings without evidence start as "skipped" rather than "pending" | |
| // to prevent them from being auto-linked into the knowledge graph. | |
| verificationStatus := VerificationStatusPending | |
| if len(convertedEvidence) == 0 { | |
| verificationStatus = VerificationStatusSkipped | |
| } | |
| convertedEvidence := ConvertEvidence(evidence) | |
| // Gate 3: Findings without verifiable evidence (non-empty FilePath) start as | |
| // "skipped" rather than "pending" to prevent hallucinated findings from being | |
| // auto-linked into the knowledge graph. | |
| verificationStatus := VerificationStatusPending | |
| tempFinding := Finding{Evidence: convertedEvidence} | |
| if !tempFinding.HasEvidence() { | |
| verificationStatus = VerificationStatusSkipped | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/agents/core/parsers.go
Line: 62-69
Comment:
**Gate 3 bypass: empty-FilePath evidence slips through as Pending**
The gate check at line 67 uses `len(convertedEvidence) == 0` as the sole criterion for flagging as `Skipped`. However, `ConvertEvidence()` (line 26–34) performs a direct struct copy without filtering:
```go
evidence[i] = Evidence(e) // empty FilePath preserved as-is
```
This creates a divergence with `HasEvidence()` in types.go (line 65–71), which explicitly requires `FilePath != ""`. An LLM response with `[{"file_path": "", "snippet": "..."}]` will:
- Pass the gate: `len(convertedEvidence) == 1` → `VerificationStatus = Pending`
- Fail `HasEvidence()`: empty `FilePath` → returns `false`
This allows hallucinated findings with empty file paths to be auto-linked into the knowledge graph, defeating the Gate 3 enforcement intended to prevent exactly this.
Fix — align the gate with `HasEvidence()` by checking that evidence entries have non-empty FilePath:
```suggestion
convertedEvidence := ConvertEvidence(evidence)
// Gate 3: Findings without verifiable evidence (non-empty FilePath) start as
// "skipped" rather than "pending" to prevent hallucinated findings from being
// auto-linked into the knowledge graph.
verificationStatus := VerificationStatusPending
tempFinding := Finding{Evidence: convertedEvidence}
if !tempFinding.HasEvidence() {
verificationStatus = VerificationStatusSkipped
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in 8255ebe — gate now uses HasEvidence() which checks for non-empty FilePath.
| bootstrap-integration: | ||
| runs-on: ubuntu-latest |
There was a problem hiding this comment.
Add timeout-minutes to prevent CI hangs on stuck tests
The bootstrap-integration job runs 50+ tests with filesystem and SQLite operations but has no job-level timeout-minutes guard. If any test deadlocks or blocks indefinitely, the job will consume runner time until GitHub's 6-hour timeout, blocking other CI runs and wasting minutes.
| bootstrap-integration: | |
| runs-on: ubuntu-latest | |
| bootstrap-integration: | |
| runs-on: ubuntu-latest | |
| timeout-minutes: 10 |
The -failfast flag in the test command (line 58) handles fast failure within the binary, but cannot interrupt a process that has already hung.
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/ci.yml
Line: 46-47
Comment:
**Add `timeout-minutes` to prevent CI hangs on stuck tests**
The `bootstrap-integration` job runs 50+ tests with filesystem and SQLite operations but has no job-level `timeout-minutes` guard. If any test deadlocks or blocks indefinitely, the job will consume runner time until GitHub's 6-hour timeout, blocking other CI runs and wasting minutes.
```suggestion
bootstrap-integration:
runs-on: ubuntu-latest
timeout-minutes: 10
```
The `-failfast` flag in the test command (line 58) handles fast failure *within* the binary, but cannot interrupt a process that has already hung.
How can I resolve this? If you propose a fix, please make it concise.- Fix self-referential edge rejection: IN deduplication causes COUNT(*) to return 1 when from==to, now uses expectedCount - Fix Gate 3 bypass: empty-FilePath evidence no longer slips through as Pending; uses HasEvidence() for consistent checking - Add timeout-minutes: 10 to bootstrap-integration CI job
Summary
GetMemoryBasePathrejectsMarkerNonecontexts;Detect()skips.taskwingmarkers above multi-repo workspacesLinkNodespre-checks node existence before INSERT (avoids SQLite error 787)Detect()checkshasNestedProjects()inMarkerNonefallback, resolving disagreement withDetectWorkspace()LoadForServicestoDocLoader; wired intoRunDeterministicBootstrapvia workspace auto-detectionHasEvidence()andNeedsHumanVerification()IsGitRepository()pre-check before running git loginstallGeminiCLInow returns errors properlybootstrap-integrationjob with failfast regression suiteTest plan
go test ./... -run TestBootstrapRepro -v— all 9 defect repro tests passgo test ./... -run TestKnowledgeLinking_NoFK -v— FK constraint tests passgo test ./... -run TestRootPathResolution -v— path resolution tests passgo test ./... -run TestIsMonorepoDetection -v— workspace detection tests passgo test ./... -run TestDocIngestion -v— doc ingestion tests passgo test ./... -run TestClaudeDriftDetection -v— MCP drift tests passgo test ./... -run TestGate3_Enforcement -v— hallucination gate tests passbootstrap-integrationjob passesGreptile Summary
This PR fixes 9 bootstrap defects spanning path resolution, FK integrity, monorepo detection, doc ingestion, MCP drift, hallucination gating, and git error handling — backed by 50+ regression tests. The implementation is well-intentioned but two critical logic gaps remain that silently undermine headline defect fixes:
Key changes:
GetMemoryBasePathrejectsMarkerNonecontexts to prevent accidental writes to~/.taskwing/memory.db✓LinkNodeswraps existence check + INSERT in transaction (addresses TOCTOU concern), butSELECT COUNT(*) FROM nodes WHERE id IN (?, ?)fails for self-referential edges due to SQL deduplication — returns 1 instead of 2 whenfrom == to✗NewFindingWithEvidence) setsVerificationStatus = Skippedwhenlen(convertedEvidence) == 0, butConvertEvidencedoes not filter emptyFilePathentries. An LLM response with[{"file_path": ""}]bypasses the gate entirely, setting status toPendinginstead ofSkipped✗Detect()now checkshasNestedProjects()inMarkerNonefallback to resolveIsMonorepomisclassification ✓DocLoader.LoadForServicesadded and wired into bootstrap for multi-repo doc discovery ✓installGeminiCLInow returns errors instead of silently dropping them ✓bootstrap-integrationCI job runs the full regression suite with-failfast, but lackstimeout-minutesguard ✗Blocking issues: Merge should be held until the two logic bugs in
LinkNodes(self-link rejection) andNewFindingWithEvidence(Gate 3 bypass) are fixed. Both allow valid data to be silently rejected or hallucinated data to slip through, directly contradicting the PR's stated defect resolutions.Confidence Score: 2/5
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[RunDeterministicBootstrap] --> B[DetectWorkspace] B --> C{containsDot services?} C -->|No sub-repos with dot| D[LoadForServices] C -->|Root service present| E[Load root only] D --> F[Aggregate docs from sub-repos] E --> G[Root docs only] F --> H[Bootstrap continues] G --> H H --> I[Detect project context] I --> J{MarkerType?} J -->|MarkerNone| K[GetMemoryBasePath REJECTS] J -->|MarkerGit/TaskWing/Lang| L[GetMemoryBasePath OK] L --> M[LinkNodes] M --> N{SELECT COUNT id IN from,to} N -->|from == to: count=1| O["❌ BUG: Erroneously rejected"] N -->|count >= 2| P[INSERT edge in tx] H --> Q[NewFindingWithEvidence] Q --> R{len convertedEvidence == 0?} R -->|true| S[VerificationStatus = Skipped] R -->|false - even with empty FilePath| T["⚠️ BUG: VerificationStatus = Pending"] T --> U{HasEvidence checks FilePath != empty} U -->|FilePath empty| V["❌ Pending but HasEvidence=false"] U -->|FilePath valid| W[Auto-linkable finding]Comments Outside Diff (3)
cmd/mcp_install.go, line 126-131 (link)Verbose mode loses real-time stderr
cmd.Stderris now unconditionally set to&stderrBufbefore theverbosecheck, meaning verbose mode users no longer see stderr output from thegeminicommand in real time. The old code explicitly wiredcmd.Stderr = os.Stderrunder verbose. Now stderr is only surfaced when the command fails — a silent regression for diagnostics.Prompt To Fix With AI
internal/bootstrap/bootstrap_repro_test.go, line 1168-1179 (link)Test passes even when the FK-fix regression is present
When
err == nil(the pre-fix behaviour — INSERT OR IGNORE swallows FK) ANDlen(postEdges) == 0(no edge created, silent data loss), the test callst.Log(...)instead oft.Error(...). This means a regression that reintroduces the silent-swallow bug would leave this test green.Prompt To Fix With AI
internal/agents/core/types.go, line 430-432 (link)Magic confidence threshold — extract as named constant
The
0.5threshold is the only knob between "auto-apply" and "human verification" for all findings. Embedding it as a literal makes it invisible to future readers and forces a global search-and-replace if the policy changes.Prompt To Fix With AI
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Last reviewed commit: b7cc06f
Context used: