Comprehensive Codebase Audit & Technical Debt Tracker

Consolidated audit · 2026-03-27 · v0.6.0 (dev) · ~18K lines · Edition 2024 · MSRV 1.94 · [Methodology](#user-content-audit-methodology)

> [!IMPORTANT]
> This issue reflects the **development branch**. Not a released version yet.

> [!NOTE]
> **ID convention:** Two-letter prefixes (FR-, SR-, PR-, TR-, PE-, DR-, UX-) reference formal requirements in [`PRD.md`](PRD.md). Single-letter prefixes (F- for findings, D- for deferred) are temporary IDs local to this audit.

## Executive Summary

| Metric | Value |
| --- | --- |
| Overall Health Score | **7.5 / 10** |
| Critical Issues | **0** |
| Accepted Warnings | **15** |
| Accepted Info | **15** |
| Rejected/Modified | **4** |

<details>
<summary>Strengths, concerns & production-readiness verdict</summary>

**Top 3 Strengths:**
1. Zero `unsafe` code — `#![forbid(unsafe_code)]` enforced (though see F-028 for binary crate gap)
2. Comprehensive test + fuzz infrastructure — 424 tests, 5 fuzz targets, proptest, wiremock, eval harness
3. Well-designed streaming pipeline — all LLM providers support streaming with cancellation, 1MB response cap

**Top 3 Concerns:**
1. Binary crate module duplication undermines `#![forbid(unsafe_code)]` (F-028)
2. Ollama URL validation lacks loopback enforcement — potential diff exfiltration (F-030)
3. Non-hermetic history tests fail under ambient git config (F-032)

**Production-Readiness Verdict:**

Suitable for v0.6.0 release for local/Ollama-centric workflows.
Address F-028, F-030, and F-032 before v1.0 or enterprise distribution.

</details>

## Audit Dimension Summary

| Dimension | Status | Key Concern |
|---|---|---|
| 1. Language & Modern Rust | :green_circle: Good | Minor `strip_prefix` opportunities |
| 2. Async & Concurrency | :yellow_circle: Needs Attention | Sync blocking in hook/clipboard paths |
| 3. Security & Safety | :yellow_circle: Needs Attention | `forbid(unsafe)` gap, Ollama URL validation |
| 4. Error Handling | :green_circle: Good | thiserror + miette, solid diagnostics |
| 5. CLI Design | :green_circle: Good | Config cascade correct; `doctor` misleading for cloud |
| 6. Performance | :yellow_circle: Needs Attention | Binary size 25MB > 15MB target |
| 7. Testing & CI | :yellow_circle: Needs Attention | Non-hermetic tests, Ubuntu-only CI |
| 8. Compliance | :green_circle: Good | ANSSI-FR mostly aligned, SPDX/REUSE compliant |

## Immediate Priority

Quick wins that should ship soon. All verified against source code.

- [x] **F-001**: Add `overflow-checks = true` to `[profile.release]` — unanimous across all reviewers, ANSSI-FR aligned. `Cargo.toml:131-134`
- [ ] **F-011**: Replace `l[1..]` byte indexing with `strip_prefix` in `context.rs` and `splitter.rs` (grep for `&line[1..]` and `&l[1..]`)
- [ ] **F-015**: Add `// SAFETY:` comments to `#[allow(clippy::cast_possible_truncation)]` in `analyzer.rs` and `differ.rs` (multiple sites — grep for `cast_possible_truncation`)
- [ ] **F-028**: Add `#![forbid(unsafe_code)]` to `main.rs` — binary crate re-declares modules via `mod` instead of importing from `lib.rs`, bypassing the safety attribute. Highest-impact finding (single-reviewer catch, confirmed by synthesizer). `main.rs:8`

## Security & Safety

- [x] **F-004**: API keys stored as plain `Option<String>`, never zeroed from memory, inherited by child processes. Consider `secrecy::SecretString`.
- [ ] **F-009**: Project config values briefly exist in Config struct before security overwrite. Single synchronous function, no concurrent access — low risk but could be cleaner. (synthesizer downgraded to INFO)
- [ ] **F-010**: Cloud provider constructors default to empty API key via `unwrap_or_default()`. Normal flow validates key beforehand, but defense-in-depth assertion would help. (synthesizer downgraded to INFO)
- [ ] **F-019**: `--allow-secrets` bypass needs risk documentation in CLI help text.
- [ ] **F-030**: Ollama URL validation is prefix-only (`starts_with("http://")`) with no loopback enforcement. User config or env vars can redirect diff traffic to arbitrary remote endpoints. PRD SR-001 specifies loopback-only resolution — not implemented.

## Async & Concurrency

- [ ] **F-002**: Synchronous `std::process::Command` blocking tokio runtime in `hook_dir()` and `copy_to_clipboard()`. Use `tokio::process::Command` or `spawn_blocking`.
- [ ] **F-003**: Blocking `std::io::stdin().read_line()` on async thread during secret confirmation. Fail-closed design mitigates risk.
- [ ] **F-008**: JoinSet join errors (task panics) silently dropped in git content fetching. Add `warn!` log.
- [ ] **F-020**: Unbounded concurrent `git show` process spawning — no semaphore. 50 staged files = 100 processes.
- [ ] **F-025**: Orphaned Ctrl+C handler task with `.ok()` silently discarding signal registration errors.

## Testing & CI

- [ ] **D-037**: Deleted/Renamed `ChangeStatus` variants never used in test fixtures
- [ ] **D-039**: `classify_span_change` None path untested
- [ ] **D-040**: `make_symbol` test helpers hardcode `line:1, end_line:10` — partially addressed
- [ ] **D-047**: No fuzz testing for non-Rust signature extraction — `fuzz_signature` only tests Rust parser
- [ ] **F-007**: MSRV "verification" in CI uses the same toolchain as all jobs — not actually testing the floor. Add separate `stable` job.
- [ ] **F-016**: CI only tests on Ubuntu. Platform-specific code (Unix permissions, macOS/Windows clipboard, hook paths) untested. Add macOS runner at minimum.
- [ ] **F-017**: No `cargo deny` in CI. PRD SR-005 specifies it for license compliance. Currently only `cargo audit`.
- [ ] **F-018**: No `clippy.toml` — would catch sync `Command` usage (F-002) and stray `dbg!`/`println!` at lint time via `disallowed-methods`/`disallowed-macros`
- [ ] **F-023**: No release automation workflow. PRD DR-002 specifies GitHub Releases via `cargo-dist`.
- [ ] **F-032 / D-007**: History tests non-hermetic — `analyze_repo_with_enough_commits` and `analyze_respects_sample_size` fail under ambient git config (GPG signing, hooks). Set `GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `HOME` to tempdir. Verified by reviewer actually executing `cargo test`.
- [ ] **TR-008**: **Layer 2 LLM quality testing missing** — eval harness tests pre-LLM pipeline but never validates actual commit message quality against a real or mocked LLM

## Performance & Binary Size

- [ ] **D-074**: Targeted hunk parsing optimization — scan `@@` headers instead of full `lines()`.
- [ ] **F-012**: `builtin_patterns()` recompiles 24 regexes per `build_patterns()` call when custom/disabled patterns are configured. Cache base patterns separately.
- [ ] **F-021**: `classify_span_change` allocates two full `String`s for whitespace comparison instead of streaming `Iterator::eq()`. Also applies to new `classify_span_change_rich()` and `AstDiffer::bodies_semantically_equal()`.
- [ ] **F-022**: Secret scanning could use `RegexSet` for batch first-pass matching instead of 24 serial regex per line.
- [ ] **F-033**: Release binary 25MB, above PRD's 15MB target. Real cause: 10 statically-linked tree-sitter grammars. Benchmark `opt-level = "z"/"s"`, consider trimming default features, add binary size gate in CI.
- [ ] **PE-007**: Token-aware truncation — replace char-based budget (~4:1 approximation) with actual BPE/tiktoken counting for accurate context window utilization.

## Semantic Analysis

- [x] **D-002**: Adaptive symbol budget — **Partially addressed.** Budget now adapts: 20% with structural diffs, 30% with signatures only, 20% base. Full count-based scaling not yet implemented.
- [ ] **D-003**: Allow `.scm` query file changes if body-node extraction fails for a language
- [ ] **D-005**: Smart signature truncation — preserve return type at end instead of simple 200-char cap
- [ ] **D-006**: Connection detection `sym_name(` pattern matches comments/strings (false positives)
- [ ] **D-008**: Modified-symbol matching for overloaded methods — `SymbolKey(kind, name, file)` can't disambiguate. Low priority (no real-world failures reported)
- [ ] **D-013**: TypeScript/JavaScript `const_declaration` not in query files — JS/TS constants never extracted
- [ ] **D-014**: Python query limited to `function_definition` and `class_definition` — no decorators, no method distinction
- [ ] **D-049**: Import weighting — categorize Internal/External/Std imports by signal value
- [ ] **D-050**: Cross-file call-site propagation context — when API changes, list which files updated call-sites
- [ ] **D-051**: Error path vs success path detection — detect changes primarily within error handling blocks
- [ ] **D-058**: Incomplete parent scope — missing modules, nested functions, C++ namespaces, anonymous impls. Current impl covers impl/class/trait only.
- [ ] **D-059**: Doc detection false positives — `is_doc_comment()` line-prefix heuristic misclassifies strings containing `///`, `#`, `"""` as comments. AST-based comment detection needed for 100% accuracy.
- [ ] **D-060**: Go import block parsing — multiline `import ()` blocks not handled by line-by-line detection
- [ ] **D-061**: Error handling regex edge cases — `?` at line end and `Result<` match in strings/comments. Also: `detect_intents()` threshold of >2 matches may produce false positives in large diffs with common Rust patterns.
- [ ] **D-062**: Test correlation pattern gaps — `test_foo.rs`, nested test dirs not matched by stem-based matching
- [x] **D-063**: `ChangeDetail` variants not implemented — **Partially addressed (FR-071).** Semantic marker detection implemented: `UnsafeAdded/Removed`, `DeriveAdded/Removed`, `DecoratorAdded/Removed`, `ExportAdded/Removed`, `MutabilityChanged`, `GenericConstraintChanged`. Remaining stubs: `AttributeAdded/Removed` (generic, non-derive), `GenericChanged` (type parameter changes).
- [ ] **D-064**: Lifetime/generic bound changes not detected
- [ ] **D-065**: `BodyUnchanged` redundant when other changes exist — omit to save tokens
- [x] **D-066**: Combined token budget unanalyzed — **Partially addressed (FR-070).** Symbol/diff budget rebalanced when structural diffs are present. A comprehensive budget simulation across all new prompt sections (STRUCTURED CHANGES, IMPORTS CHANGED, RELATED FILES, INTENT) together has not been done.
- [ ] **D-068**: Import alias handling — `use foo as bar` alias not captured
- [ ] **FR-073**: Structural fingerprinting / move detection — AST structure hashes to detect function moves with zero semantic changes.

## Splitter & Message Quality

- [ ] **D-067**: Splitter synergy vague — need specific merge threshold for SymbolDiff integration
- [ ] **D-069**: Internal reranking for small models — sample 2 candidates, score deterministically
- [ ] **D-070**: Graph-based clustering with weighted edges — replace greedy fingerprint clustering
- [ ] **D-071**: Import/dependency scanning — `use crate::...::file_B_stem` for stronger dependency edges
- [ ] **D-072**: LLM-assisted splitting — 2-stage classify+generate as tie-breaker
- [ ] **D-075**: New-import-as-feat signal — detect "new import of types not previously used" as `feat` indicator
- [ ] **FR-074**: Symbol Dependency Graph — replace hardcoded path heuristics in splitter with actual AST-derived dependencies
- [ ] **FR-075**: Configurable file categorization — custom patterns in config for proprietary build systems

## CLI & UX

- [ ] **D-016**: Diff truncated mid-line in `--dry-run` output — display/streaming issue
- [ ] **D-017**: Commit message printed twice in `--dry-run` output
- [ ] **F-029**: `commitbee doctor` doesn't verify cloud provider connectivity — only checks API key presence (`is_some()`). Ollama gets a real health check but OpenAI/Anthropic don't. Call `provider.verify().await`.
- [ ] **F-034**: Terminal progress lacks ASCII fallback for `TERM=dumb` or limited terminals. PRD UX-002 specifies this.

## Observability

- [ ] **F-013**: No `#[instrument]` spans on key pipeline functions (`generate_commit`, `extract_symbols`, `build`). Would improve debugging and timing visibility.
- [x] ~~**F-024**: Test count documentation drift~~ — **Fixed in v0.6.0.** All docs now say 424.

## Code Quality & Modernization

- [ ] **D-028**: HARD LIMIT section in user prompt duplicates system prompt instructions
- [ ] **D-032**: `CodeSymbol` all-pub fields without `#[non_exhaustive]` — only matters if publishing as library
- [ ] **D-034**: `PromptContext` all-pub fields without `#[non_exhaustive]` — same caveat
- [ ] **F-005**: `reqwest` TLS backend not explicitly pinned. Synthesizer notes: reqwest 0.13 uses platform-native TLS by default. Pin to `rustls-tls` if pure-Rust stack desired, but native TLS is acceptable.
- [ ] **F-027**: Lossy float-to-integer cast in `history.rs:49` — use `.round() as u32`
- [ ] **F-028 (extended)**: Refactor `main.rs` to consume library crate instead of re-declaring module tree

## Architecture

- [ ] **D-052**: Migrate `git show` calls to gix object reading — reduce process overhead on large repos
- [ ] **D-055**: Architecture plugin trait for `AnalyzerService` — trait-based per-language isolation
- [ ] **D-056**: Evidence "Why" hook — hidden `rationale` field for debugging incorrect inference
- [ ] **D-057**: Contextual sibling API display — include unchanged public functions in same scope for context
- [ ] **F-031**: Non-atomic split commits — TOCTOU window between unstage/re-stage with no rollback. Documented known issue.

## Compliance

| Framework | Applies? | Current Status | Gap |
|---|---|---|---|
| ANSSI-FR Secure Rust | Advisory | Mostly compliant | `overflow-checks` (F-001), binary `forbid(unsafe)` gap (F-028) |
| SPDX/REUSE | Yes | Compliant | None |
| Open Source Licensing | Yes | Compliant — all deps MIT/Apache-2.0 | None |
| EU AI Act | Minimal risk | Compliant | None |
| GDPR/CCPA | Marginally | Adequate mitigations | PII in diffs is transient |
| EU Cyber Resilience Act | Commercial only | Not started | SBOM generation needed by Dec 2027 |

## Dependency Health

All 30 direct dependencies clean. No CVEs. Full audit by `cargo audit` (433 crate graph).

| Notable | Version | Note |
|---|---|---|
| reqwest | 0.13.2 | Platform-native TLS default |
| gix | 0.80.0 | Historical gix-transport advisory patched |
| tokio | 1.50.0 | Named-pipe advisory patched |
| tree-sitter | 0.26.7 | Clean |

## Methodology

<details>
<summary>Audit process details</summary>

<a name="audit-methodology"></a>

### Dialectic Verification Pipeline

The audit uses a multi-stage dialectic verification process designed to prevent anchoring bias:

1. **Parallel independent review** — Multiple AI models (3-5 per run) analyze the target simultaneously. Each reviewer works in isolation with no access to other reviewers' findings. Models are drawn from different families to maximize perspective diversity.
2. **Deduplication** — An orchestrator merges overlapping findings, flags conflicts, and normalizes severity.
3. **Independent synthesis** — A fresh-context synthesizer (no exposure to the orchestration process) reads every finding and verifies each claim against actual source code. Every finding receives an explicit ACCEPT/REJECT/MODIFY decision with evidence.

### This Audit

- **Code audit (2026-03-26)**: 5 reviewers across 3 model families. 97 raw findings deduplicated to 34. Synthesizer accepted 30, rejected 2, modified 4.
- **Semantic plans audit (2026-03-26)**: 3 reviewers (GLM5, Codex gpt-5.4, Gemini 2.5 Pro) verified FR-064–FR-072 implementation plans. 12 fixes accepted, 11 items deferred.
- **Deferred items**: Accumulated from prior dialectic reviews of implementation plans (AST context overhaul, semantic understanding tiers, message quality), architecture analyses, and post-implementation code reviews. Each followed the same parallel review + synthesis pipeline.
- **Notable**: One reviewer uniquely identified F-028 (the `forbid(unsafe)` binary crate gap), F-030 (loopback enforcement), and F-032 (non-hermetic tests) — the latter verified by actually executing `cargo test` on the codebase. This validates that independent multi-model review catches issues no single reviewer would find.

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comprehensive Codebase Audit & Technical Debt Tracker #3

Executive Summary

Audit Dimension Summary

Immediate Priority

Security & Safety

Async & Concurrency

Testing & CI

Performance & Binary Size

Semantic Analysis

Splitter & Message Quality

CLI & UX

Observability

Code Quality & Modernization

Architecture

Compliance

Dependency Health

Methodology

Dialectic Verification Pipeline

This Audit

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metric	Value
Overall Health Score	7.5 / 10
Critical Issues	0
Accepted Warnings	15
Accepted Info	15
Rejected/Modified	4

Dimension	Status	Key Concern
1. Language & Modern Rust	🟢 Good	Minor `strip_prefix` opportunities
2. Async & Concurrency	🟡 Needs Attention	Sync blocking in hook/clipboard paths
3. Security & Safety	🟡 Needs Attention	`forbid(unsafe)` gap, Ollama URL validation
4. Error Handling	🟢 Good	thiserror + miette, solid diagnostics
5. CLI Design	🟢 Good	Config cascade correct; `doctor` misleading for cloud
6. Performance	🟡 Needs Attention	Binary size 25MB > 15MB target
7. Testing & CI	🟡 Needs Attention	Non-hermetic tests, Ubuntu-only CI
8. Compliance	🟢 Good	ANSSI-FR mostly aligned, SPDX/REUSE compliant

Framework	Applies?	Current Status	Gap
ANSSI-FR Secure Rust	Advisory	Mostly compliant	`overflow-checks` (F-001), binary `forbid(unsafe)` gap (F-028)
SPDX/REUSE	Yes	Compliant	None
Open Source Licensing	Yes	Compliant — all deps MIT/Apache-2.0	None
EU AI Act	Minimal risk	Compliant	None
GDPR/CCPA	Marginally	Adequate mitigations	PII in diffs is transient
EU Cyber Resilience Act	Commercial only	Not started	SBOM generation needed by Dec 2027

Notable	Version	Note
reqwest	0.13.2	Platform-native TLS default
gix	0.80.0	Historical gix-transport advisory patched
tokio	1.50.0	Named-pipe advisory patched
tree-sitter	0.26.7	Clean

Uh oh!

Comprehensive Codebase Audit & Technical Debt Tracker #3

Description

Executive Summary

Audit Dimension Summary

Immediate Priority

Security & Safety

Async & Concurrency

Testing & CI

Performance & Binary Size

Semantic Analysis

Splitter & Message Quality

CLI & UX

Observability

Code Quality & Modernization

Architecture

Compliance

Dependency Health

Methodology

Dialectic Verification Pipeline

This Audit

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions