selftest: remove bounds check in safe_double#1
Conversation
Triggers the bmc-agent self-test workflow. Expected outcome: a PR comment from the action reporting REAL_BUG on safe_double, property safe_double.overflow.1, root cause "signed integer overflow on x * 2 for x near INT_MAX". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
action.yml cannot use ${{ github.token }} expressions in its env block
— that context only resolves inside workflow files. Trying to do so
fails the runner with "Unrecognized named-value: 'github'" at action
load time (live-runner test 2026-05-18).
Fix: declare github-token as an explicit input on the action, mapped
to GITHUB_TOKEN in the env block via ${{ inputs.github-token }}. The
calling workflow now passes ${{ secrets.GITHUB_TOKEN }} (or a PAT)
through that input.
Updated:
action.yml — new github-token input.
.github/workflows/bmc-agent-selftest.yml — pass github-token.
actions/bmc-agent-pr/example-workflow.yml — same.
actions/bmc-agent-pr/README.md — document the input.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous fix replaced the env-block expression but left a stray
${{ github.token }} inside the description string of the github-token
input. GitHub Actions evaluates expressions in descriptions too, so
the runner still failed with 'Unrecognized named-value: github' at
load time. Rephrase as plain `github.token` code-tick to avoid the
${{ sequence entirely.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bmc-agent resultsNo All verdicts (\(0))[] |
Two changes that turn the action's previous black-box failure mode into a debuggable one: 1. entrypoint.sh now dumps a 40-line tail of the bmc-agent log even on rc=0. The first live-runner self-test (run 26048828588) reported 0 real_bug findings in 1s — way too fast for any real LLM call. With rc=0 the previous entrypoint suppressed the log entirely, making silent-failure indistinguishable from clean run. 2. Workflow uploads .bmc-agent-artifacts/ via actions/upload-artifact so the per-function spec/harness/CEX/log directory survives the ephemeral runner for offline triage. Also fixes a cosmetic markdown bug in the PR comment: literal \(0) appeared where the verdict count should have been due to over-eager backslash-escaping inside the heredoc; now uses an explicit shell variable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bmc-agent resultsNo All verdicts (0)[] |
Self-test runs report LLM 401 Missing Authentication on every Phase 1 spec generation, even with secrets.BMC_AGENT_LLM_KEY supplied via the action input. Add a diagnostics group to the entrypoint that prints presence and length (NOT the value) of ANTHROPIC_API_KEY, BMC_AGENT_LLM_BASE_URL, BMC_AGENT_LLM_MODEL, and GITHUB_TOKEN. The value-prefix in the log is just three chars (e.g. 'sk-') so secret-redaction continues to mask the rest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bmc-agent resultsNo All verdicts (0)[] |
The previous secret upload (gh secret set --body -) somehow stored only 1 char of the key; the action's diagnostic group showed ANTHROPIC_API_KEY length=1 inside the container. Re-uploaded the 73-byte OpenRouter key via stdin redirect (gh secret set < /tmp/key) which preserves the full value. This empty commit triggers a fresh workflow run that should now see the full key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 r8125_fiber hunt surfaced rtl8125_check_fiber_mode_support as a candidate REAL_BUG via property overflow.1 — CBMC's --conversion-check flagged the u32->u8 cast in u8 tmp = (u8)rtl8125_mac_ocp_read(tp, 0xD006); even though the cast is explicit driver-IO code taking the low byte of a 32-bit register read. This is intentional, defined C behavior, not a bug. Add a realism-stage witness detector that recognizes the pattern: CBMC description matches 'arithmetic overflow on <kind> to <kind> type conversion in (T)expr' AND target type T is in a narrow-int allowlist (u8/u16/u32, __u8/__u16/__u32, uint8_t/uint16_t/uint32_t, char/short, etc.). Detector returns UNREALISTIC without an LLM call, matching the pattern used by the USB-serial, PHY, and netdev-private detectors. Conservative — we don't fire on user-defined narrowing target types where unintended narrowing might be the actual bug. Also includes the MMIO-region-size harness fix landed (accidentally) in 00da727 — void *mmio_addr now gets a 4 KiB backing region instead of the 1-byte __CPROVER_r_ok(p, sizeof(void)) that produced spurious OOB on legitimate register accesses at offsets > 1 (r8125_dash FP). +8 regression tests; 498 passing total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
1 similar comment
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
llama.cpp ggml-alloc.c sweep (2026-05-18) found that every one of 87
functions failed CBMC parsing with 'redefinition of body of struct
_IO_FILE' (exit code 6) — the preprocessed source contains the full
glibc internal struct body, and CBMC's own libc redefines it when the
harness references FILE/pthread_mutex_t/etc.
Add _strip_glibc_internal_struct_bodies pass that converts
struct _IO_FILE { ... };
into
struct _IO_FILE; /* glibc-internal body stripped */
while leaving project-defined structs alone. The body strip:
- matches names starting with _IO_, __, or _G_ (glibc-internal
convention; kernel uses __kernel_* etc. but kernel-mode is a
no-op so those survive);
- walks brace depth correctly through nested unions / anonymous
structs / arrays (struct __pthread_mutex_s has nested members);
- is comment- and string-literal-aware so it doesn't trip on
struct-keyword occurrences inside doc text.
Wired into the harness pipeline alongside _strip_glibc_internal_typedefs
in two places (real-libc dynamic harness path + CBMC verification
harness path).
+4 regression tests; 502 passing total.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
…list Re-run of ggml-alloc.c with the prefix-only strip (commit 80e3b3f) still produced 95 CBMC parse errors — glibc structs without _IO_/__/_G_ prefix survived (timeval, timespec, random_data, drand48_data). Add an allowlist of known POSIX/glibc struct names that CBMC's built-in libc redefines: timeval/timespec/itimerval/itimerspec, tm, stat/stat64, sockaddr family + msghdr/iovec, addrinfo/hostent, lconv, rusage/rlimit, passwd/group, dirent/dirent64, sigaction/siginfo_t, flock, termios, utsname, random_data/drand48_data, sched_param, fpos_t. List grown empirically from observed 'redefinition of body of struct X' errors; expand as more surface. +1 regression test (503 total). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
First real-OSS sweep result from the autonomous overnight work. 87 functions in ggml/src/ggml-alloc.c; 22 completed a full BMC + Phase 3 cycle and verified clean of any real bug; 26 hit CBMC parse errors (struct/typedef redefinition conflicts even with --real-libc); 0 confirmed real bugs. Most informative candidate: aligned_offset's offset+align unsigned overflow, surfaced by --unsigned-overflow-check, then iteratively constrained-out by the feedback loop after it learned that all callers bound offset by max_buffer_size. Demonstrates the realism + feedback chain distinguishing theoretical-overflow from exploitable-overflow on a recognized AI-inference codebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
Two issues uncovered by the ggml.c run (all 439 functions failed CBMC parse/convert; 0 verdicts produced but the pipeline still emitted "AMC Pipeline END: 0 real bug(s) found" — ambiguous between clean-verify and total parse failure). 1. Coverage diagnostics. AMCPipeline.run() now scans every saved cbmc_result.json after Phase 4 and aggregates failure modes: undefined symbols (build-config macros like GGML_VERSION), bad assert arity, struct-syntax errors, and other parse/conv errors. Writes coverage_diagnostics.json into the driver dir and logs a single ::warning block when ≥1 function failed pre-verdict. When ≥50% of functions failed, escalates to "treat this run as BLOCKED, not as a clean verify." 2. owns(scope, ptr) DSL form. The existing _OWNS_RE matched the 2-arg form with group(1)="ctx, a" and produced `assert(ctx, result != NULL)` which CBMC rejects as a 2-arg assert. Two-group regex now drops the scope and keeps only the pointer. Affected ~34 ggml.c harnesses in the just-completed run. ggml.c failure-mode writeup at findings/oss-llamacpp/. ggml-quants.c re-run in flight uses the fixed code path. +7 tests; 510 passing.
Two issues uncovered by the ggml-quants.c run: 3 confirmed_dynamic
SIGSEGV findings that were all the same FP class, plus a tree-sitter
misparse that masked struct return types in ggml.c.
1. Address-taken detector in cex_validator. The classifier promoted
iq1_sort_helper / iq2_compare_func / iq3_compare_func to real_bug
because _find_callers() returned empty (the only "callers" are
qsort(..., fn) which the call graph misses). The dynamic
validator then ran them with NULL pointers and got SIGSEGV.
New _is_address_taken() scans the source (with comments and
string literals neutralised) for any reference to the function
name not followed by '('. When found, the no-callers path now
returns UNRESOLVED instead of REAL_BUG, with reasoning citing
the library-callback indirection. Generalises to qsort/bsearch/
pthread_create/atexit/signal and struct function-pointer tables.
2. Tree-sitter recovers struct keyword through macro prefix. For
"GGML_API struct ggml_tensor * fn(...)", tree-sitter consumes
"GGML_API struct" as a stray declaration and the function's
type field becomes the bare tag "ggml_tensor *". The harness
then writes "ggml_tensor *result = ..." which CBMC rejects
because ggml_tensor is not a typedef. Parser now scans backwards
from the type node's start byte, over whitespace only and stopping
at any statement separator, and recovers struct/union/enum if
present. Affected 2 functions in the ggml.c run; the pattern
hits any C library that uses an export macro plus tag types.
ggml-quants clean-verify writeup at findings/oss-llamacpp/.
+9 tests; 519 passing.
ggml-cpu/quants.c run blocked at 43/43 with ``syntax error before '=''`` because every signature uses ``T * GGML_RESTRICT s`` and tree-sitter parsed ``GGML_RESTRICT`` as the parameter name, dropping the real ``s`` into a sibling ERROR node. The harness then declared three local variables all named ``GGML_RESTRICT`` and CBMC rejected the redefinition. When (a) the parsed parameter name looks like a qualifier macro (ALL_CAPS-with-underscore, or leading ``__``) and (b) the next sibling in the parameter list is an ERROR node containing a single identifier, fold the macro into the type string and use the ERROR identifier as the parameter name. Conservative — single-letter caps (no underscore, no leading ``__``) are not treated as macros. Generalises beyond ggml: same pattern hits any C codebase using __restrict__, __nonnull__, project-local NORETURN/HOT/COLD attribute macros, etc. +2 tests; 521 passing.
ggml-cpu/quants.c run produced 7 confirmed_dynamic findings on the
quantize_row_* family, all of the form:
void quantize_row_q5_K(... , int64_t k) {
assert(k % QK_K == 0); // <-- fires under CBMC
...
}
QK_K is a #define'd block-size constant (256). The source-assert
extractor already promotes ``assert(expr)`` at the top of a function
to ``__CPROVER_assume(expr)`` when ``expr`` only references the
function's parameters and a small allowlist of stdint maxes — but
QK_K wasn't on the allowlist, so the assert wasn't promoted and
CBMC explored k=257 and friends, tripping the source assert.
Allow any ALL_CAPS_WITH_UNDERSCORE identifier as a stable
compile-time constant (the conventional C name shape for #defines
and enum members). Two negative tests guard against lowercase free
identifiers and __builtin_/_Static_ names being swallowed.
Generalises beyond ggml: PAGE_SIZE, L1_CACHE_BYTES, BLOCK_SIZE,
MAX_USERS, etc. — the precondition-assert idiom is ubiquitous in
Linux kernel modules and embedded C codebases.
ggml-cpu/quants.c clean-verify writeup at findings/oss-llamacpp/.
+3 tests; 524 passing.
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
C source files commonly define a function twice under a preprocessor
``#ifdef X /* real */ #else /* stub */ #endif``. Tree-sitter parses
both branches because it doesn't process the preprocessor; without
disambiguation, the SECOND (stub) definition overwrites the first
(real) one in our functions/call_graph/function_bodies maps. The
real body's callees vanish, so any function the real body called
gets mis-classified as a system-entry point.
Regression: curl/parsedate.c run 2026-05-19 reported ``datenum`` and
``time2epoch`` as confirmed_system_entry findings because the 3-stmt
``#else`` stub for ``parsedate`` overwrote the 30-stmt real body
and dropped both functions from the call graph.
Now: when a name is seen a second time, keep the entry with the
longer body. Stubs are short (``{ (void)x; return 0; }``); real
implementations are multi-statement. First-wins on ties so the rule
is deterministic in file order.
+2 tests; 526 passing.
17/17 functions verified after the duplicate-definition fix. The v2 run produced 2 confirmed_system_entry false positives on ``datenum`` and ``time2epoch`` because tree-sitter parsed both branches of the ``#ifndef CURL_DISABLE_PARSEDATE / #else / #endif`` guard and the stub body overwrote the real one, dropping ``parsedate``'s callees from the graph. After the parser fix the v3 run is fully clean.
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
curl's ``UNITTEST CURLUcode parse_port(...)`` (where ``UNITTEST`` expands to ``static`` or empty) was mis-parsed: tree-sitter treated ``UNITTEST`` as the entire return type and stashed ``CURLUcode`` in a sibling ERROR node. The harness emitted ``UNITTEST result = parse_port(...);``, which after CPP became ``static result = ...`` and CBMC rejected with "expected constant expression". Mirror the GGML_RESTRICT param-name recovery for return types: when the parsed type looks like a storage-class macro and the function_definition has an ERROR sibling containing a single identifier, fold the macro into the type prefix and use the ERROR identifier as the actual return type. Also extend the macro-detection heuristic to cover single-word uppercase macros of ≥4 chars (UNITTEST, EXPORT, INLINE) in addition to the existing ALL_CAPS_WITH_UNDERSCORE shape. Regression: curl/urlapi.c run 2026-05-19, parse_port harness failed CBMC with the expected-constant error. Generalises to any C codebase using project-local linkage macros (OpenSSL ``OSSL_API``, kernel ``__init``/``__cold``, etc.). +2 tests; 528 passing. urlapi.c clean-verify writeup at findings/oss-curl/.
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
CCC encoding.rs (Anthropic's Claude-written C compiler, 2.7k stars, 96% Rust) is the first real-world AI-generated Rust target we ran bmc-agent against. Two of 4 functions in the encoding utility module (bytes_to_string, encode_non_utf8) timed out at the default Kani slice_bound=4 / unwind=4 / timeout=120s because each input byte fans out into String::from_utf8 / std::str::from_utf8 internal state-space exploration. Verified experimentally that the same encode_non_utf8 harness regenerated with slice_bound=1, unwind=2 verifies clean in ~40 s. Changes: - KaniBackend.generate_harness gains an optional slice_bound_override kwarg. - KaniBackend.check gains unwind_override / timeout_override. - BMCEngine.check_function (Rust path only) detects "kani timed out", regenerates the harness with (slice_bound=2, unwind=4), then (slice_bound=1, unwind=2), saving each retry harness and re-checking. Stops as soon as a non-timeout verdict lands. Each retry overwrites the saved harness so the artifact reflects the configuration that actually produced the recorded verdict, keeping classification/realism stages downstream consistent with the counterexample they're reasoning about. +2 tests; 530 passing.
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
Two interlocking fixes that unblock standalone Kani verification on real-world Rust crates whose source files reference crate-local types. CCC const_arith.rs run 2026-05-19 — 3 selected primitive functions (wrap_result, unsigned_op, bool_to_i64) all failed Kani parse because the file's preamble pulled in two ``use crate::*;`` lines AND a dozen sibling fns referenced those imports in their signatures. 1. _strip_crate_local_use_statements Comment out any ``use crate::*``, ``use super::*``, ``use self::*`` (and ``pub use`` variants) in the included full source. std::, core::, alloc::, and external crate paths are untouched. 2. _strip_crate_local_fn_items Drop every top-level fn from the included source EXCEPT the target and any names in the recorded callee set. Sibling fns routinely reference the SAME types brought in by the now-stripped use lines (``op: &BinOp``) and would fail rustc parse with E0412. Tree-sitter-based — regex isn't reliable for brace matching in Rust (lifetimes, generics, raw strings). Module-level consts, type aliases, and external use statements stay intact. The harness now has: preamble + target fn + listed callees. +4 tests; 535 passing.
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
…eout CCC source.rs v2 — 4 pure top-level fns all hit "loop unwind bound exhausted" at the default unwind=4 because each function has at least one inner loop that runs up to slice_bound (4) iterations PLUS some constant overhead. Kani's heuristic needs unwind ≥ loop-bound + 1; 4 is borderline and routinely fails. Add a separate retry path for unwind-exhausted: one shot at unwind=max(4×default, 16), keeping slice_bound intact. Distinct from the timeout-retry chain (which shrinks state). The chain now is: 1. Default (unwind=4, bound=4, 120 s) 2. On "unwind bound exhausted": bump unwind to 16 3. On "timed out": (separate path) shrink bound 2 → 1, tighten unwind CCC source.rs v3 with both retries: 3/4 fns verified clean, 1 inconclusive. v2 had 4/4 failed at parse/Kani. compute_line_offsets exercises BOTH retries — unwind-bump first lands a timeout, then shrink-retry verifies clean.
The fn-stripper from 9e31a9f covered top-level free fns but missed impl blocks. impl methods reference module-local sibling fns that the stripper just removed, so source files with impls cascade into E0425 ("cannot find function") even after the use-strip pass. Rust parser M1 scope only analyses top-level free fns, so any impl block in the file is by definition NOT the target. Strip every impl_item whole. Module-level structs, enums, type aliases, and consts stay intact. Regression: CCC source.rs 2026-05-19 — 4 pure top-level fns all failed Kani parse because impl Span / impl SourceManager bodies referenced compute_line_offsets / memchr_newline / FxHashMap after those fn defs were stripped. +1 test; 536 passing.
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
1 similar comment
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
bmc-agent Rust path validation: first real-bug findings from the
AI-generated CCC after this session's 5 Rust-pipeline fixes.
1. bytes_to_str (frontend/preprocessor/utils.rs:39) — slice-index
panic when start > end or end > bytes.len(). pub fn with no
documented precondition; CBMC trips
core::slice::index::slice_index_fail. All ~9 in-tree callers
guard the invariant in surrounding parser loops, so the panic
is latent — but exposed as a pub API footgun.
2. skip_literal_bytes (utils.rs:46) — `let mut i = start + 1;`
overflows when start == usize::MAX. Debug build: panic
("attempt to add with overflow", CBMC's assertion.1). Release
build: wraps to 0 and the preprocessor silently advances
from a wrong position. saturating_add or a precondition assert
would fix.
The other 2 fns in the file (copy_literal_bytes_raw, …_to_string)
were blocked by an &mut-parameter limitation in the Kani harness
generator — filed as next-up bmc-agent improvement.
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
CCC preprocessor utils.rs had 2 fns blocked at harness gen by ``&mut references in Kani harnesses are not yet supported``: copy_literal_bytes_raw(... result: &mut Vec<u8>) and copy_literal_bytes_to_string(... result: &mut String). Both are byte-index helpers in the AI-generated CCC preprocessor with the same Rust panic shape as the 2 v1-flagged fns in the same file. Added init blocks for three concrete &mut shapes: - &mut Vec<T>: ``let mut _owned = Vec::new(); &mut _owned`` - &mut String: ``let mut _owned = String::new(); &mut _owned`` - &mut <primitive>: ``let mut _owned = kani::any(); &mut _owned`` Anything else (&mut UserStruct) still hits the explicit NotImplementedError — we don't try to nondet-init arbitrary user-defined structs. v2 prepu run after the fix: 4/4 bug-prone fns produced verdicts, 4 real Rust panics confirmed (slice OOB + usize overflow) across the file's 8 pub byte-index helpers. Writeup updated. +3 tests; 539 passing.
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
The single-line regex caught ``use super::utils::{`` but stopped at
the first newline, leaving the closing ``};`` and the wrapped
identifier list as orphan code. Every harness for CCC macro_defs.rs
failed rustc with "unexpected closing delimiter".
Switch to MULTILINE | DOTALL so ``[^;]*`` spans newlines until the
terminating ``;``, and wrap the whole match in a ``/* ... */`` block
comment (single line comments cannot span the body of a multi-line
use). Replace also escapes any embedded ``*/`` to avoid prematurely
closing the wrapper.
+1 test (multi-line case); +1 test rewritten for block-comment shape;
540 passing.
Crate-internal visibility restrictors fail standalone compilation
with E0433 ("there are too many leading `super` keywords").
Replace pub(super), pub(crate), pub(self), and pub(in <path>) with
plain pub. The harness loses the host-crate-specific scope
restriction, but that restriction is meaningless in standalone
compilation anyway and the type's accessibility to the harness is
what we need.
Regression: CCC macro_defs.rs 2026-05-19 — every struct field
used pub(super) and rustc rejected every harness with E0433
before Kani saw any of them.
+1 test; 541 passing. macro_defs.rs still hits a deeper
cross-file-import problem (calls fns defined in utils.rs that
won't resolve without cross-module stub generation) — separate
follow-up.
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
1 similar comment
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
…urvive sibling-strip
The harness generator's sibling-strip step kept only DIRECT callees of
the target fn (``func.callees``). Indirectly reachable helpers got
stripped and the harness failed to compile.
Concrete trigger: CCC src/backend/asm_expr.rs has a chain of mutually
recursive descent-parser fns — eval_add -> eval_mul -> eval_unary ->
eval_tokens -> eval_or -> eval_xor -> eval_and -> eval_shift -> eval_add.
eval_add's direct callees are {Ok, eval_mul, tokens.len}, so eval_mul
was kept but eval_unary (called by eval_mul) was stripped, breaking
compilation with E0425 "cannot find function eval_unary". 11 of 12
fns in the file failed verification pre-verdict.
Fix: compute transitive closure of callees against parsed_file.call_graph
before passing keep_callees to the stripper. With this, all 8 eval_*
helpers survive eval_add's harness; unrelated siblings
(char_escape_value, tokenize_expr, parse_*) are still stripped.
Tests: 3 new unit tests (closure walk, cycle termination, None fallback).
544 total tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The slice/Vec param-init paths used ``[T; N] = kani::any()``, which requires ``T: kani::Arbitrary``. Kani auto-derives Arbitrary for primitive types and tuples thereof, but NOT for user-defined enums/structs. Concrete trigger: CCC asm_expr.rs has fns taking ``&[ExprToken]`` where ``ExprToken`` is a local enum with an ``Op2(&'static str)`` variant. Trying to nondet-init the backing array fails with ``E0277 the trait bound ExprToken: kani::Arbitrary is not satisfied``. Surfaced AFTER the transitive-callee fix unblocked function-resolution. Fix: when the element type isn't a known-Arbitrary type (primitive), emit an empty Vec/slice backing instead. The function still gets a verdict — degenerate (only exercises the empty-input branch) but better than a build failure. Behaviour unchanged for primitive-element slices. Tests: 4 new unit tests (slice-of-user-type, &mut [T] of user-type, Vec<UserEnum>, sanity check that &[u8] still uses kani::any). 548 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t / spurious)
Previously the validator collapsed two distinct cases into REAL_BUG:
(a) panic reachable from an in-tree call site, and (b) panic reachable
only via the public API surface — cargo-fuzz / future-caller risk, but
no current call site produces the state. This over-reported the second
case as "active crashes" when in fact every in-tree caller maintains
the missing precondition via surrounding invariants (parser-loop
invariants, ELF parse-bounds checks).
The user (2026-05-19) asked for explicit separation: reachable bugs
should be REAL_BUG; latent panics on public API should be LATENT, with
a separate output stream so triage can pick severity tier.
Implementation:
- New ``CExOutcome.LATENT = "latent"`` enum value.
- ``_is_structural_panic(failing_property, trace)`` — substring match
against Rust/C panic markers (slice OOB, integer overflow, divide-by-
zero, capacity_overflow, …). Rejects custom postcondition violations
(those are LLM-spec artifacts, not pure panic sites).
- ``_is_publicly_callable(func)`` — duck-typed against Rust
``is_pub`` and C ``not is_static``.
- ``ValidationResult.is_latent_bug`` property + ``is_latent_bug`` key
in ``to_dict()``.
- ``_handle_spurious``: when refinement stabilises on a publicly-
callable fn with a structural-panic CEx, upgrades SPURIOUS → LATENT
instead of silently dropping the finding.
- Entry-function branch (no in-scope callers): same upgrade rule.
Previously called REAL_BUG with ``confirmed_system_entry``; this
over-reported. Non-public entry points still go to REAL_BUG
(conservative — kernel handlers etc.).
- ``BugReporter.save_latent_report()`` + ``ArtifactStore.save_latent_
report()`` write to ``{driver}/{fn}/latent_report.json`` next to the
existing ``bug_report.json``.
- Pipeline routes LATENT outcomes in all 3 validate sites (Phase 3,
Phase 3b recheck, Phase 3c CEGAR). Stashed on ``pipeline.latent_
reports`` for CLI access.
- CLI ``verify`` summary prints separate ``Latent panics on pub API: N``
block with property + function name.
Tests: 12 new unit tests covering the structural-panic substring
matrix, public-API detection across Rust/C signatures, and the
ValidationResult LATENT flag. Total: 560 passing (was 548).
This is a methodology change, not a heuristic tweak. Existing
real-bug-only consumers (eval scripts, baselines) are unchanged
because ``is_real_bug`` still means ``outcome == REAL_BUG``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nic markers The LATENT marker list missed Rust stdlib panic helpers for failed ``unwrap()`` / ``expect()`` on Option/Result. The property name lands as ``std::result::unwrap_failed.assertion.N`` or ``core::option::expect_failed.assertion.N`` — both are pure panic sites reachable via cargo-fuzz / future-caller, so they should route to LATENT (not REAL_BUG) when the entry-function branch fires. Regression: CCC ``bytes_to_str`` and ``copy_literal_bytes_to_string`` in utils.rs v3 (2026-05-19) — both have unwrap() on ``str::from_utf8`` and both got REAL_BUG verdicts that should have been LATENT under the new classification (they're public-API fns with no in-tree caller producing the bad-UTF-8 state). Tests: +1 unit test (13 total in test_latent_classification). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit (c3864af) routed all pub-API-reachable structural panics to LATENT regardless of threat model. User feedback 2026-05-19: under threat_model='security' (the default), the attacker IS a current caller — adversarial inputs cross the public API boundary, so a panic reachable from the pub API IS a real bug, not "future-caller risk." The LATENT bucket only makes sense under non-security threat models (safety, functional) where we explicitly care about crashes in current in-tree call paths rather than worst-case input fuzzing. Behavior: - threat_model='security' (default): pub API + structural panic + no in-tree caller → REAL_BUG. Matches the cargo-fuzz standard the user originally set ("if a fuzzer can trigger it, it's a real bug"). The reasoning string explicitly mentions threat model so triage knows why the bug was promoted. - threat_model='safety' / 'functional': same conditions → LATENT. The hardening bucket for in-tree-unreachable but pub-API-reachable panics. - Non-public entries always go to REAL_BUG regardless of threat model. Applied at both LATENT decision points: the entry-function branch (no in-scope callers) and the _handle_spurious refinement-stabilised branch. Tests: +1 unit test verifying the Config.threat_model field is honored and defaults to 'security'. 562 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e ones
User feedback 2026-05-19: BMC is more expressive than what the current
spec generation uses it for. The DSL and prompts are entirely defensive
— "what preconditions does this function need to be safe?" — which uses
a fraction of what Kani/CBMC can prove. BMC can also verify functional
correctness ("result == reference_computation"), algebraic identities
("align_up_64(v, a) % a == 0"), round-trip properties, structural
invariants. We were leaving that power on the table.
Phase 1: add an optional ``functional_spec`` field to the LLM response
JSON. When the LLM can express what a CORRECT implementation would
compute as a Rust/C boolean expression, it provides that spec; the
parser AND-merges it into the postcondition so downstream harness gen,
classification, refinement, and reporting consume it without any
further changes.
precondition: "data.len() >= offset + 2" ← unchanged (defensive)
postcondition: "result <= u16::MAX" ← unchanged (defensive)
functional_spec: "result == ((data[offset+1] as u16) << 8)
| (data[offset] as u16)" ← NEW
=>
merged post: "(result <= u16::MAX)
&& (result == ((data[offset+1] as u16) << 8)
| (data[offset] as u16))"
Behaviour:
- Old prompts without ``functional_spec`` parse identically (back-compat).
- Empty/trivial values ("", "true", "1", "n/a", "none") drop cleanly —
no "(post) && (true)" noise.
- If the defensive post is itself "true", the functional spec replaces
it entirely instead of wrapping.
- Both ENTRY_SPEC_PROMPT and INTERNAL_SPEC_PROMPT now ask for the
optional field with examples (reference equivalence, algebraic laws,
round-trip, structural).
- Prompts explicitly warn against fabricating functional specs — wrong
spec produces false positives — and explicitly say leave empty when
not expressible (e.g. parsers returning structured ASTs).
Expected effect on CCC sweep:
- Byte-helpers: spec asserts "result == manual LE decode"; Kani still
finds OOB panics (same as before) AND verifies the bit operations.
- align_up_64: spec captures the alignment invariant; Kani may find
arithmetic bugs that the defensive spec missed.
- Wrappers like bytes_to_str: spec compares to std::str::from_utf8;
Kani can prove the wrapper is faithful.
This unlocks BMC for functional-correctness verification within the
existing architecture. No new dataclasses, no new harness paths.
Tests: 6 new unit tests covering missing/empty/trivial/real functional
spec parsing. 568 passing total (was 562).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… by default Validation on linker_common/hash.rs + write.rs revealed the LLM was declining to emit functional_spec for every function — pre/post stayed "true" with the original prompt. Hand-injecting a functional spec on align_up_64 ((result >= val) && (result - val < align) && (result % align == 0)) successfully verified functional correctness, proving the wiring works. The LLM's reticence was the only gap. Revised prompt: - Reframe ``functional_spec`` from "optional" to "STRONGLY ENCOURAGED" — the LLM was reading the optional framing as "skip if unsure" by default. - Add 5 concrete spec templates with real Rust syntax: byte readers (u16::from_le_bytes equivalence), alignment (algebraic identity), hashing (fold expression), wrappers (round-trip), bit ops (direct formula). - Note tautological specs are CORRECT and should be emitted, not skipped as obvious — they catch interference / bugs in the function body even when the reference computation mirrors it. - Keep the anti-fabrication rule but reorder: the message is now "emit by default; skip only when genuinely impossible," not "skip by default; emit only when confident." Expected effect on next CCC sweep: more functional specs in spec.json, more verified-clean runs with strengthened postconditions, and potentially new bugs caught by Kani when the implementation diverges from the asserted reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…prompts The dual-spec path (enable_dual_spec=True default) uses CALLER_HEAVY_SPEC_PROMPT and IMPL_HEAVY_SPEC_PROMPT, not the ENTRY_SPEC_PROMPT that previous commits modified. So Phase 1 functional specs were never being generated in practice — the pipeline routes through dual-spec which still asked for only pre/post. Validated by direct LLM query: with the original ENTRY_SPEC_PROMPT the model emits a perfect functional spec (e.g. "result == name.iter().fold(5381u32, |h, &b| h.wrapping_mul(33).wrapping_add(b as u32))" for gnu_hash). In the pipeline the dual-spec path stripped that field because the prompts didn't declare it. Both dual-spec prompts now request functional_spec with the same template-set as ENTRY_SPEC_PROMPT (byte readers, hash folds, alignment identities, wrappers). Tautological-reference framing preserved — the LLM should emit the reference computation even when it mirrors the body. 568 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live validation on linker_common/write.rs (with API key + functional spec landed) revealed a regression: the LLM emits rich functional specs for state-mutating fns using ``old(buf.len())`` to reference pre-call state. Kani's plain ``kani::assert`` can't see pre-state, and the harness generator doesn't snapshot mutable inputs before the call, so the harness fails to compile with ``E0425: cannot find function 'old'`` and the function is SKIPPED — neither REAL_BUG nor LATENT. Concrete fallout: write.rs went from 3 bugs (align_up_64, pad_to, write_elf64_phdr_at) to 1 bug + 2 skipped harnesses, because the functional spec contained ``old()``. Quick fix: detect ``old\\(`` in the functional spec and drop the spec entirely. Loses some precision (the post stays at the defensive "true") but harnesses compile and the function gets a real verdict. Proper fix (deferred): snapshot mutable inputs in the harness (``let _pre_buf = buf.clone();``) and substitute ``old(X)`` → ``_pre_X`` before passing to Kani. Then state-mutating fns get their full functional specs verified. The regex ``\\bold\\s*\\(`` is word-bounded so it doesn't match identifiers like ``old_threshold`` or ``old_var``. Tests: +3 unit tests (with-old / nested-old / without-old). 571 total (was 568). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1's previous quick-fix blanket-dropped functional specs containing old() because Kani's plain kani::assert can't see pre-call state. That suppressed useful specs on every state-mutating function (pad_to, write_elf64_phdr, write_elf64_shdr, codegen passes — anywhere the LLM needs to compare pre vs post buffer state). This commit adds the proper substitution: - New helper ``_extract_old_snapshots(postcondition)`` finds each ``old(EXPR)`` call with paren-balanced extraction (handles nesting), generates a snapshot binding ``let _pre_N = (EXPR);``, and rewrites the postcondition to use ``_pre_N`` in place of ``old(EXPR)``. - Slice-indexing expressions (containing ``[``) get ``.to_vec()`` so the snapshot owns the data and survives the about-to-be-mutated borrow. Scalar exprs like ``buf.len()`` stay as-is (assume Copy). - Nested ``old(...)`` is handled: the inner old() is stripped during snapshot generation because at snapshot point everything is already pre-state. So ``old(buf[..old(buf.len())])`` becomes ``let _pre_0 = (buf[..buf.len()]).to_vec();`` and the post substitutes the outer call with ``_pre_0``. - Snapshots are emitted in the harness BETWEEN the precondition assumption and the function call, so they capture in-spec pre-state. - Identifier lookbehind: ``cold(`` and ``_old(`` are not matched — only the ``old(`` form, optionally preceded by whitespace or non-identifier characters. The earlier ``_parse_llm_spec_response`` drop logic is removed since the backend now handles old() natively. Updated 3 tests in test_functional_spec.py from "spec dropped" → "spec preserved"; added 5 new tests in test_kani.py covering the snapshot helper (no-old passthrough, scalar snapshot, slice to_vec, nested old strip, identifier guard). Tests: 576 passing (was 571, +5 new). Expected effect on next write.rs run: pad_to, write_elf64_phdr, write_elf64_shdr no longer SKIPPED for E0425; their functional specs verify against the implementation alongside the defensive checks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…positives
Phase 1 validation on CCC long_double.rs surfaced a systematic Phase 1
weakness: when the LLM-generated functional_spec contains plain
``+``/``-``/``*`` arithmetic on the function's inputs, Kani evaluates
the spec on adversarial nondet values (``usize::MAX``, ``i64::MIN``,
etc.) and the SPEC itself overflows. The pipeline classifies that as
``check_<fn>.assertion.N: attempt to add with overflow`` — a CEX in
the harness wrapper, not in the function body — and over-reports it
as a REAL_BUG.
Concrete fallout: long_double.rs reported 4 REAL_BUG (f64_decompose,
make_x87_infinity, shift_right_256_with_grs, shifted_limb) — all
spec-evaluation overflow, no actual bug in the function bodies.
inline_asm.rs reported 1 REAL_BUG (is_generic_gp_constraint) — also
spec fabrication (over-simple functional spec).
Two-pronged fix:
1. Prompt update (prompts.py) — add explicit "SPEC-EVALUATION OVERFLOW"
warning to the entry-spec prompt. Tells the LLM to use
``wrapping_add``/``wrapping_mul`` in functional specs, or guard
the precondition to exclude the overflowing inputs. With concrete
wrong / right examples.
2. Classifier filter (cex_validator._witness_obvious_artifact) — when
the failing property starts with ``check_`` (harness wrapper) AND
the trace mentions ``attempt to {add,sub,mul,shift} with overflow``,
downgrade to model-artifact (spec-evaluation overflow), not REAL_BUG.
This catches false positives even when the LLM ignored the prompt
warning.
The filter is precise: body-level overflow (``<fn>.assertion.N`` without
the check_ prefix) still routes through the REAL_BUG / LATENT
classification normally. Postcondition violations with non-overflow
traces also pass through as real bugs (those are genuine spec/impl
mismatches, like the align_up misalignment finding).
Tests: 7 new unit tests in test_spec_overflow_filter.py covering
add/sub/mul/shift overflow filtering, body-level overflow not filtered,
postcondition violations not filtered, check_*-named fns edge case.
583 passing (was 576, +7 new).
This makes Phase 1 noticeably more conservative in practice: it stays
silent on inputs the LLM should have guarded, instead of producing
false-positive bug reports.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comprehensive writeup of the session: 25 real bugs found in CCC (24 defensive panic-class + 1 functional semantic), 4+ functional correctness proofs (hash.rs full djb2/SysV equivalence, alignment invariants on align_up_64, ELF byte-layout on write_elf64_phdr/shdr), 5 Phase 1 false positives (inline_asm + 4 long_double). Includes: full bug table organized by file + function class, list of 13 bmc-agent improvements landed this session (with commit hashes), what works well vs what doesn't, threat-model framing clarification, and the methodology observation about caller-guards-match-inferred-pre. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… positives Re-running long_double.rs with --enable-realism-check confirmed the hypothesis: all 3 Phase 1 finds (f64_decompose, shift_right_256_with_grs, make_x87_infinity) were verdicted UNREALISTIC with high confidence by the realism check. CLI suppression dropped raw count 3 → final count 0. Concrete realism reasoning examples: - f64_decompose: "CBMC assertion violation with no counterexample witness values recorded" - shift_right_256_with_grs: "verification artifact from CBMC's symbolic execution, not a real bug" - make_x87_infinity: identified the x87 byte-layout convention mismatch correctly Updated the session summary with: - Validated mitigation (realism-check works, not just theoretical) - Recommended invocation for future CCC runs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bmc-agent results1
All verdicts (1)[
{
"source": "examples/bmc-action-selftest/safe_double.c",
"function": "safe_double",
"outcome": "real_bug",
"failing_property": "safe_double.overflow.1",
"reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
}
] |
Self-test for bmc-agent-pr action
This PR removes the bounds check in
examples/bmc-action-selftest/safe_double.cto trigger the bmc-agent self-test workflow.Expected behaviour:
bmc-agent self-testworkflow fires on this PR.bmc-agent verifywith--enable-flag-selection.--signed-overflow-check(and/or--unsigned-overflow-check).safe_double.overflow.1— signed integer overflow onx * 2for largex.Validation goal: confirm the action runs in an actual GitHub Actions runner (not just locally in Docker as in commit 327f628).