-
-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Consolidated audit · 2026-03-27 · v0.6.0 (dev) · ~18K lines · Edition 2024 · MSRV 1.94 · Methodology
Important
This issue reflects the development branch. Not a released version yet.
Note
ID convention: Two-letter prefixes (FR-, SR-, PR-, TR-, PE-, DR-, UX-) reference formal requirements in PRD.md. Single-letter prefixes (F- for findings, D- for deferred) are temporary IDs local to this audit.
Executive Summary
| Metric | Value |
|---|---|
| Overall Health Score | 7.5 / 10 |
| Critical Issues | 0 |
| Accepted Warnings | 15 |
| Accepted Info | 15 |
| Rejected/Modified | 4 |
Strengths, concerns & production-readiness verdict
Top 3 Strengths:
- Zero
unsafecode —#![forbid(unsafe_code)]enforced (though see F-028 for binary crate gap) - Comprehensive test + fuzz infrastructure — 424 tests, 5 fuzz targets, proptest, wiremock, eval harness
- Well-designed streaming pipeline — all LLM providers support streaming with cancellation, 1MB response cap
Top 3 Concerns:
- Binary crate module duplication undermines
# - Ollama URL validation lacks loopback enforcement — potential diff exfiltration (F-030)
- Non-hermetic history tests fail under ambient git config (F-032)
Production-Readiness Verdict:
Suitable for v0.6.0 release for local/Ollama-centric workflows.
Address F-028, F-030, and F-032 before v1.0 or enterprise distribution.
Audit Dimension Summary
| Dimension | Status | Key Concern |
|---|---|---|
| 1. Language & Modern Rust | 🟢 Good | Minor strip_prefix opportunities |
| 2. Async & Concurrency | 🟡 Needs Attention | Sync blocking in hook/clipboard paths |
| 3. Security & Safety | 🟡 Needs Attention | forbid(unsafe) gap, Ollama URL validation |
| 4. Error Handling | 🟢 Good | thiserror + miette, solid diagnostics |
| 5. CLI Design | 🟢 Good | Config cascade correct; doctor misleading for cloud |
| 6. Performance | 🟡 Needs Attention | Binary size 25MB > 15MB target |
| 7. Testing & CI | 🟡 Needs Attention | Non-hermetic tests, Ubuntu-only CI |
| 8. Compliance | 🟢 Good | ANSSI-FR mostly aligned, SPDX/REUSE compliant |
Immediate Priority
Quick wins that should ship soon. All verified against source code.
- F-001: Add
overflow-checks = trueto[profile.release]— unanimous across all reviewers, ANSSI-FR aligned.Cargo.toml:131-134 - F-011: Replace
l[1..]byte indexing withstrip_prefixincontext.rsandsplitter.rs(grep for&line[1..]and&l[1..]) - F-015: Add
// SAFETY:comments to#[allow(clippy::cast_possible_truncation)]inanalyzer.rsanddiffer.rs(multiple sites — grep forcast_possible_truncation) - F-028: Add
#![forbid(unsafe_code)]tomain.rs— binary crate re-declares modules viamodinstead of importing fromlib.rs, bypassing the safety attribute. Highest-impact finding (single-reviewer catch, confirmed by synthesizer).main.rs:8
Security & Safety
- F-004: API keys stored as plain
Option<String>, never zeroed from memory, inherited by child processes. Considersecrecy::SecretString. - F-009: Project config values briefly exist in Config struct before security overwrite. Single synchronous function, no concurrent access — low risk but could be cleaner. (synthesizer downgraded to INFO)
- F-010: Cloud provider constructors default to empty API key via
unwrap_or_default(). Normal flow validates key beforehand, but defense-in-depth assertion would help. (synthesizer downgraded to INFO) - F-019:
--allow-secretsbypass needs risk documentation in CLI help text. - F-030: Ollama URL validation is prefix-only (
starts_with("http://")) with no loopback enforcement. User config or env vars can redirect diff traffic to arbitrary remote endpoints. PRD SR-001 specifies loopback-only resolution — not implemented.
Async & Concurrency
- F-002: Synchronous
std::process::Commandblocking tokio runtime inhook_dir()andcopy_to_clipboard(). Usetokio::process::Commandorspawn_blocking. - F-003: Blocking
std::io::stdin().read_line()on async thread during secret confirmation. Fail-closed design mitigates risk. - F-008: JoinSet join errors (task panics) silently dropped in git content fetching. Add
warn!log. - F-020: Unbounded concurrent
git showprocess spawning — no semaphore. 50 staged files = 100 processes. - F-025: Orphaned Ctrl+C handler task with
.ok()silently discarding signal registration errors.
Testing & CI
- D-037: Deleted/Renamed
ChangeStatusvariants never used in test fixtures - D-039:
classify_span_changeNone path untested - D-040:
make_symboltest helpers hardcodeline:1, end_line:10— partially addressed - D-047: No fuzz testing for non-Rust signature extraction —
fuzz_signatureonly tests Rust parser - F-007: MSRV "verification" in CI uses the same toolchain as all jobs — not actually testing the floor. Add separate
stablejob. - F-016: CI only tests on Ubuntu. Platform-specific code (Unix permissions, macOS/Windows clipboard, hook paths) untested. Add macOS runner at minimum.
- F-017: No
cargo denyin CI. PRD SR-005 specifies it for license compliance. Currently onlycargo audit. - F-018: No
clippy.toml— would catch syncCommandusage (F-002) and straydbg!/println!at lint time viadisallowed-methods/disallowed-macros - F-023: No release automation workflow. PRD DR-002 specifies GitHub Releases via
cargo-dist. - F-032 / D-007: History tests non-hermetic —
analyze_repo_with_enough_commitsandanalyze_respects_sample_sizefail under ambient git config (GPG signing, hooks). SetGIT_CONFIG_NOSYSTEM=1,GIT_CONFIG_GLOBAL=/dev/null,HOMEto tempdir. Verified by reviewer actually executingcargo test. - TR-008: Layer 2 LLM quality testing missing — eval harness tests pre-LLM pipeline but never validates actual commit message quality against a real or mocked LLM
Performance & Binary Size
- D-074: Targeted hunk parsing optimization — scan
@@headers instead of fulllines(). - F-012:
builtin_patterns()recompiles 24 regexes perbuild_patterns()call when custom/disabled patterns are configured. Cache base patterns separately. - F-021:
classify_span_changeallocates two fullStrings for whitespace comparison instead of streamingIterator::eq(). Also applies to newclassify_span_change_rich()andAstDiffer::bodies_semantically_equal(). - F-022: Secret scanning could use
RegexSetfor batch first-pass matching instead of 24 serial regex per line. - F-033: Release binary 25MB, above PRD's 15MB target. Real cause: 10 statically-linked tree-sitter grammars. Benchmark
opt-level = "z"/"s", consider trimming default features, add binary size gate in CI. - PE-007: Token-aware truncation — replace char-based budget (~4:1 approximation) with actual BPE/tiktoken counting for accurate context window utilization.
Semantic Analysis
- D-002: Adaptive symbol budget — Partially addressed. Budget now adapts: 20% with structural diffs, 30% with signatures only, 20% base. Full count-based scaling not yet implemented.
- D-003: Allow
.scmquery file changes if body-node extraction fails for a language - D-005: Smart signature truncation — preserve return type at end instead of simple 200-char cap
- D-006: Connection detection
sym_name(pattern matches comments/strings (false positives) - D-008: Modified-symbol matching for overloaded methods —
SymbolKey(kind, name, file)can't disambiguate. Low priority (no real-world failures reported) - D-013: TypeScript/JavaScript
const_declarationnot in query files — JS/TS constants never extracted - D-014: Python query limited to
function_definitionandclass_definition— no decorators, no method distinction - D-049: Import weighting — categorize Internal/External/Std imports by signal value
- D-050: Cross-file call-site propagation context — when API changes, list which files updated call-sites
- D-051: Error path vs success path detection — detect changes primarily within error handling blocks
- D-058: Incomplete parent scope — missing modules, nested functions, C++ namespaces, anonymous impls. Current impl covers impl/class/trait only.
- D-059: Doc detection false positives —
is_doc_comment()line-prefix heuristic misclassifies strings containing///,#,"""as comments. AST-based comment detection needed for 100% accuracy. - D-060: Go import block parsing — multiline
import ()blocks not handled by line-by-line detection - D-061: Error handling regex edge cases —
?at line end andResult<match in strings/comments. Also:detect_intents()threshold of >2 matches may produce false positives in large diffs with common Rust patterns. - D-062: Test correlation pattern gaps —
test_foo.rs, nested test dirs not matched by stem-based matching - D-063:
ChangeDetailvariants not implemented — Partially addressed (FR-071). Semantic marker detection implemented:UnsafeAdded/Removed,DeriveAdded/Removed,DecoratorAdded/Removed,ExportAdded/Removed,MutabilityChanged,GenericConstraintChanged. Remaining stubs:AttributeAdded/Removed(generic, non-derive),GenericChanged(type parameter changes). - D-064: Lifetime/generic bound changes not detected
- D-065:
BodyUnchangedredundant when other changes exist — omit to save tokens - D-066: Combined token budget unanalyzed — Partially addressed (FR-070). Symbol/diff budget rebalanced when structural diffs are present. A comprehensive budget simulation across all new prompt sections (STRUCTURED CHANGES, IMPORTS CHANGED, RELATED FILES, INTENT) together has not been done.
- D-068: Import alias handling —
use foo as baralias not captured - FR-073: Structural fingerprinting / move detection — AST structure hashes to detect function moves with zero semantic changes.
Splitter & Message Quality
- D-067: Splitter synergy vague — need specific merge threshold for SymbolDiff integration
- D-069: Internal reranking for small models — sample 2 candidates, score deterministically
- D-070: Graph-based clustering with weighted edges — replace greedy fingerprint clustering
- D-071: Import/dependency scanning —
use crate::...::file_B_stemfor stronger dependency edges - D-072: LLM-assisted splitting — 2-stage classify+generate as tie-breaker
- D-075: New-import-as-feat signal — detect "new import of types not previously used" as
featindicator - FR-074: Symbol Dependency Graph — replace hardcoded path heuristics in splitter with actual AST-derived dependencies
- FR-075: Configurable file categorization — custom patterns in config for proprietary build systems
CLI & UX
- D-016: Diff truncated mid-line in
--dry-runoutput — display/streaming issue - D-017: Commit message printed twice in
--dry-runoutput - F-029:
commitbee doctordoesn't verify cloud provider connectivity — only checks API key presence (is_some()). Ollama gets a real health check but OpenAI/Anthropic don't. Callprovider.verify().await. - F-034: Terminal progress lacks ASCII fallback for
TERM=dumbor limited terminals. PRD UX-002 specifies this.
Observability
- F-013: No
#[instrument]spans on key pipeline functions (generate_commit,extract_symbols,build). Would improve debugging and timing visibility. -
F-024: Test count documentation drift— Fixed in v0.6.0. All docs now say 424.
Code Quality & Modernization
- D-028: HARD LIMIT section in user prompt duplicates system prompt instructions
- D-032:
CodeSymbolall-pub fields without#[non_exhaustive]— only matters if publishing as library - D-034:
PromptContextall-pub fields without#[non_exhaustive]— same caveat - F-005:
reqwestTLS backend not explicitly pinned. Synthesizer notes: reqwest 0.13 uses platform-native TLS by default. Pin torustls-tlsif pure-Rust stack desired, but native TLS is acceptable. - F-027: Lossy float-to-integer cast in
history.rs:49— use.round() as u32 - F-028 (extended): Refactor
main.rsto consume library crate instead of re-declaring module tree
Architecture
- D-052: Migrate
git showcalls to gix object reading — reduce process overhead on large repos - D-055: Architecture plugin trait for
AnalyzerService— trait-based per-language isolation - D-056: Evidence "Why" hook — hidden
rationalefield for debugging incorrect inference - D-057: Contextual sibling API display — include unchanged public functions in same scope for context
- F-031: Non-atomic split commits — TOCTOU window between unstage/re-stage with no rollback. Documented known issue.
Compliance
| Framework | Applies? | Current Status | Gap |
|---|---|---|---|
| ANSSI-FR Secure Rust | Advisory | Mostly compliant | overflow-checks (F-001), binary forbid(unsafe) gap (F-028) |
| SPDX/REUSE | Yes | Compliant | None |
| Open Source Licensing | Yes | Compliant — all deps MIT/Apache-2.0 | None |
| EU AI Act | Minimal risk | Compliant | None |
| GDPR/CCPA | Marginally | Adequate mitigations | PII in diffs is transient |
| EU Cyber Resilience Act | Commercial only | Not started | SBOM generation needed by Dec 2027 |
Dependency Health
All 30 direct dependencies clean. No CVEs. Full audit by cargo audit (433 crate graph).
| Notable | Version | Note |
|---|---|---|
| reqwest | 0.13.2 | Platform-native TLS default |
| gix | 0.80.0 | Historical gix-transport advisory patched |
| tokio | 1.50.0 | Named-pipe advisory patched |
| tree-sitter | 0.26.7 | Clean |
Methodology
Audit process details
Dialectic Verification Pipeline
The audit uses a multi-stage dialectic verification process designed to prevent anchoring bias:
- Parallel independent review — Multiple AI models (3-5 per run) analyze the target simultaneously. Each reviewer works in isolation with no access to other reviewers' findings. Models are drawn from different families to maximize perspective diversity.
- Deduplication — An orchestrator merges overlapping findings, flags conflicts, and normalizes severity.
- Independent synthesis — A fresh-context synthesizer (no exposure to the orchestration process) reads every finding and verifies each claim against actual source code. Every finding receives an explicit ACCEPT/REJECT/MODIFY decision with evidence.
This Audit
- Code audit (2026-03-26): 5 reviewers across 3 model families. 97 raw findings deduplicated to 34. Synthesizer accepted 30, rejected 2, modified 4.
- Semantic plans audit (2026-03-26): 3 reviewers (GLM5, Codex gpt-5.4, Gemini 2.5 Pro) verified FR-064–FR-072 implementation plans. 12 fixes accepted, 11 items deferred.
- Deferred items: Accumulated from prior dialectic reviews of implementation plans (AST context overhaul, semantic understanding tiers, message quality), architecture analyses, and post-implementation code reviews. Each followed the same parallel review + synthesis pipeline.
- Notable: One reviewer uniquely identified F-028 (the
forbid(unsafe)binary crate gap), F-030 (loopback enforcement), and F-032 (non-hermetic tests) — the latter verified by actually executingcargo teston the codebase. This validates that independent multi-model review catches issues no single reviewer would find.