selftest: remove bounds check in safe_double by theyoucheng · Pull Request #1 · agentic-prover/aprover

theyoucheng · 2026-05-18T17:05:14Z

Self-test for bmc-agent-pr action

This PR removes the bounds check in examples/bmc-action-selftest/safe_double.c to trigger the bmc-agent self-test workflow.

Expected behaviour:

The bmc-agent self-test workflow fires on this PR.
The action preprocesses the changed file, runs bmc-agent verify with --enable-flag-selection.
The flag-selector picks --signed-overflow-check (and/or --unsigned-overflow-check).
CBMC surfaces safe_double.overflow.1 — signed integer overflow on x * 2 for large x.
A comment is posted on this PR summarising the finding.

Validation goal: confirm the action runs in an actual GitHub Actions runner (not just locally in Docker as in commit 327f628).

Triggers the bmc-agent self-test workflow. Expected outcome: a PR comment from the action reporting REAL_BUG on safe_double, property safe_double.overflow.1, root cause "signed integer overflow on x * 2 for x near INT_MAX". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

action.yml cannot use ${{ github.token }} expressions in its env block — that context only resolves inside workflow files. Trying to do so fails the runner with "Unrecognized named-value: 'github'" at action load time (live-runner test 2026-05-18). Fix: declare github-token as an explicit input on the action, mapped to GITHUB_TOKEN in the env block via ${{ inputs.github-token }}. The calling workflow now passes ${{ secrets.GITHUB_TOKEN }} (or a PAT) through that input. Updated: action.yml — new github-token input. .github/workflows/bmc-agent-selftest.yml — pass github-token. actions/bmc-agent-pr/example-workflow.yml — same. actions/bmc-agent-pr/README.md — document the input. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous fix replaced the env-block expression but left a stray ${{ github.token }} inside the description string of the github-token input. GitHub Actions evaluates expressions in descriptions too, so the runner still failed with 'Unrecognized named-value: github' at load time. Rephrase as plain `github.token` code-tick to avoid the ${{ sequence entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T17:17:16Z

bmc-agent results

No real_bug verdicts across 1 changed file(s).

All verdicts (\(0))

[]

Two changes that turn the action's previous black-box failure mode into a debuggable one: 1. entrypoint.sh now dumps a 40-line tail of the bmc-agent log even on rc=0. The first live-runner self-test (run 26048828588) reported 0 real_bug findings in 1s — way too fast for any real LLM call. With rc=0 the previous entrypoint suppressed the log entirely, making silent-failure indistinguishable from clean run. 2. Workflow uploads .bmc-agent-artifacts/ via actions/upload-artifact so the per-function spec/harness/CEX/log directory survives the ephemeral runner for offline triage. Also fixes a cosmetic markdown bug in the PR comment: literal \(0) appeared where the verdict count should have been due to over-eager backslash-escaping inside the heredoc; now uses an explicit shell variable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T17:23:36Z

bmc-agent results

No real_bug verdicts across 1 changed file(s).

All verdicts (0)

[]

Self-test runs report LLM 401 Missing Authentication on every Phase 1 spec generation, even with secrets.BMC_AGENT_LLM_KEY supplied via the action input. Add a diagnostics group to the entrypoint that prints presence and length (NOT the value) of ANTHROPIC_API_KEY, BMC_AGENT_LLM_BASE_URL, BMC_AGENT_LLM_MODEL, and GITHUB_TOKEN. The value-prefix in the log is just three chars (e.g. 'sk-') so secret-redaction continues to mask the rest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T17:26:59Z

bmc-agent results

No real_bug verdicts across 1 changed file(s).

All verdicts (0)

[]

The previous secret upload (gh secret set --body -) somehow stored only 1 char of the key; the action's diagnostic group showed ANTHROPIC_API_KEY length=1 inside the container. Re-uploaded the 73-byte OpenRouter key via stdin redirect (gh secret set < /tmp/key) which preserves the full value. This empty commit triggers a fresh workflow run that should now see the full key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Round-2 r8125_fiber hunt surfaced rtl8125_check_fiber_mode_support as a candidate REAL_BUG via property overflow.1 — CBMC's --conversion-check flagged the u32->u8 cast in u8 tmp = (u8)rtl8125_mac_ocp_read(tp, 0xD006); even though the cast is explicit driver-IO code taking the low byte of a 32-bit register read. This is intentional, defined C behavior, not a bug. Add a realism-stage witness detector that recognizes the pattern: CBMC description matches 'arithmetic overflow on <kind> to <kind> type conversion in (T)expr' AND target type T is in a narrow-int allowlist (u8/u16/u32, __u8/__u16/__u32, uint8_t/uint16_t/uint32_t, char/short, etc.). Detector returns UNREALISTIC without an LLM call, matching the pattern used by the USB-serial, PHY, and netdev-private detectors. Conservative — we don't fire on user-defined narrowing target types where unintended narrowing might be the actual bug. Also includes the MMIO-region-size harness fix landed (accidentally) in 00da727 — void *mmio_addr now gets a 4 KiB backing region instead of the 1-byte __CPROVER_r_ok(p, sizeof(void)) that produced spurious OOB on legitimate register accesses at offsets > 1 (r8125_dash FP). +8 regression tests; 498 passing total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T17:31:01Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

github-actions · 2026-05-18T17:31:27Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

llama.cpp ggml-alloc.c sweep (2026-05-18) found that every one of 87 functions failed CBMC parsing with 'redefinition of body of struct _IO_FILE' (exit code 6) — the preprocessed source contains the full glibc internal struct body, and CBMC's own libc redefines it when the harness references FILE/pthread_mutex_t/etc. Add _strip_glibc_internal_struct_bodies pass that converts struct _IO_FILE { ... }; into struct _IO_FILE; /* glibc-internal body stripped */ while leaving project-defined structs alone. The body strip: - matches names starting with _IO_, __, or _G_ (glibc-internal convention; kernel uses __kernel_* etc. but kernel-mode is a no-op so those survive); - walks brace depth correctly through nested unions / anonymous structs / arrays (struct __pthread_mutex_s has nested members); - is comment- and string-literal-aware so it doesn't trip on struct-keyword occurrences inside doc text. Wired into the harness pipeline alongside _strip_glibc_internal_typedefs in two places (real-libc dynamic harness path + CBMC verification harness path). +4 regression tests; 502 passing total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T18:24:39Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

…list Re-run of ggml-alloc.c with the prefix-only strip (commit 80e3b3f) still produced 95 CBMC parse errors — glibc structs without _IO_/__/_G_ prefix survived (timeval, timespec, random_data, drand48_data). Add an allowlist of known POSIX/glibc struct names that CBMC's built-in libc redefines: timeval/timespec/itimerval/itimerspec, tm, stat/stat64, sockaddr family + msghdr/iovec, addrinfo/hostent, lconv, rusage/rlimit, passwd/group, dirent/dirent64, sigaction/siginfo_t, flock, termios, utsname, random_data/drand48_data, sched_param, fpos_t. List grown empirically from observed 'redefinition of body of struct X' errors; expand as more surface. +1 regression test (503 total). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T19:07:45Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

First real-OSS sweep result from the autonomous overnight work. 87 functions in ggml/src/ggml-alloc.c; 22 completed a full BMC + Phase 3 cycle and verified clean of any real bug; 26 hit CBMC parse errors (struct/typedef redefinition conflicts even with --real-libc); 0 confirmed real bugs. Most informative candidate: aligned_offset's offset+align unsigned overflow, surfaced by --unsigned-overflow-check, then iteratively constrained-out by the feedback loop after it learned that all callers bound offset by max_buffer_size. Demonstrates the realism + feedback chain distinguishing theoretical-overflow from exploitable-overflow on a recognized AI-inference codebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T21:38:38Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

Two issues uncovered by the ggml.c run (all 439 functions failed CBMC parse/convert; 0 verdicts produced but the pipeline still emitted "AMC Pipeline END: 0 real bug(s) found" — ambiguous between clean-verify and total parse failure). 1. Coverage diagnostics. AMCPipeline.run() now scans every saved cbmc_result.json after Phase 4 and aggregates failure modes: undefined symbols (build-config macros like GGML_VERSION), bad assert arity, struct-syntax errors, and other parse/conv errors. Writes coverage_diagnostics.json into the driver dir and logs a single ::warning block when ≥1 function failed pre-verdict. When ≥50% of functions failed, escalates to "treat this run as BLOCKED, not as a clean verify." 2. owns(scope, ptr) DSL form. The existing _OWNS_RE matched the 2-arg form with group(1)="ctx, a" and produced `assert(ctx, result != NULL)` which CBMC rejects as a 2-arg assert. Two-group regex now drops the scope and keeps only the pointer. Affected ~34 ggml.c harnesses in the just-completed run. ggml.c failure-mode writeup at findings/oss-llamacpp/. ggml-quants.c re-run in flight uses the fixed code path. +7 tests; 510 passing.

Two issues uncovered by the ggml-quants.c run: 3 confirmed_dynamic SIGSEGV findings that were all the same FP class, plus a tree-sitter misparse that masked struct return types in ggml.c. 1. Address-taken detector in cex_validator. The classifier promoted iq1_sort_helper / iq2_compare_func / iq3_compare_func to real_bug because _find_callers() returned empty (the only "callers" are qsort(..., fn) which the call graph misses). The dynamic validator then ran them with NULL pointers and got SIGSEGV. New _is_address_taken() scans the source (with comments and string literals neutralised) for any reference to the function name not followed by '('. When found, the no-callers path now returns UNRESOLVED instead of REAL_BUG, with reasoning citing the library-callback indirection. Generalises to qsort/bsearch/ pthread_create/atexit/signal and struct function-pointer tables. 2. Tree-sitter recovers struct keyword through macro prefix. For "GGML_API struct ggml_tensor * fn(...)", tree-sitter consumes "GGML_API struct" as a stray declaration and the function's type field becomes the bare tag "ggml_tensor *". The harness then writes "ggml_tensor *result = ..." which CBMC rejects because ggml_tensor is not a typedef. Parser now scans backwards from the type node's start byte, over whitespace only and stopping at any statement separator, and recovers struct/union/enum if present. Affected 2 functions in the ggml.c run; the pattern hits any C library that uses an export macro plus tag types. ggml-quants clean-verify writeup at findings/oss-llamacpp/. +9 tests; 519 passing.

ggml-cpu/quants.c run blocked at 43/43 with ``syntax error before '=''`` because every signature uses ``T * GGML_RESTRICT s`` and tree-sitter parsed ``GGML_RESTRICT`` as the parameter name, dropping the real ``s`` into a sibling ERROR node. The harness then declared three local variables all named ``GGML_RESTRICT`` and CBMC rejected the redefinition. When (a) the parsed parameter name looks like a qualifier macro (ALL_CAPS-with-underscore, or leading ``__``) and (b) the next sibling in the parameter list is an ERROR node containing a single identifier, fold the macro into the type string and use the ERROR identifier as the parameter name. Conservative — single-letter caps (no underscore, no leading ``__``) are not treated as macros. Generalises beyond ggml: same pattern hits any C codebase using __restrict__, __nonnull__, project-local NORETURN/HOT/COLD attribute macros, etc. +2 tests; 521 passing.

ggml-cpu/quants.c run produced 7 confirmed_dynamic findings on the quantize_row_* family, all of the form: void quantize_row_q5_K(... , int64_t k) { assert(k % QK_K == 0); // <-- fires under CBMC ... } QK_K is a #define'd block-size constant (256). The source-assert extractor already promotes ``assert(expr)`` at the top of a function to ``__CPROVER_assume(expr)`` when ``expr`` only references the function's parameters and a small allowlist of stdint maxes — but QK_K wasn't on the allowlist, so the assert wasn't promoted and CBMC explored k=257 and friends, tripping the source assert. Allow any ALL_CAPS_WITH_UNDERSCORE identifier as a stable compile-time constant (the conventional C name shape for #defines and enum members). Two negative tests guard against lowercase free identifiers and __builtin_/_Static_ names being swallowed. Generalises beyond ggml: PAGE_SIZE, L1_CACHE_BYTES, BLOCK_SIZE, MAX_USERS, etc. — the precondition-assert idiom is ubiquitous in Linux kernel modules and embedded C codebases. ggml-cpu/quants.c clean-verify writeup at findings/oss-llamacpp/. +3 tests; 524 passing.

github-actions · 2026-05-18T23:22:28Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

C source files commonly define a function twice under a preprocessor ``#ifdef X /* real */ #else /* stub */ #endif``. Tree-sitter parses both branches because it doesn't process the preprocessor; without disambiguation, the SECOND (stub) definition overwrites the first (real) one in our functions/call_graph/function_bodies maps. The real body's callees vanish, so any function the real body called gets mis-classified as a system-entry point. Regression: curl/parsedate.c run 2026-05-19 reported ``datenum`` and ``time2epoch`` as confirmed_system_entry findings because the 3-stmt ``#else`` stub for ``parsedate`` overwrote the 30-stmt real body and dropped both functions from the call graph. Now: when a name is seen a second time, keep the entry with the longer body. Stubs are short (``{ (void)x; return 0; }``); real implementations are multi-statement. First-wins on ties so the rule is deterministic in file order. +2 tests; 526 passing.

17/17 functions verified after the duplicate-definition fix. The v2 run produced 2 confirmed_system_entry false positives on ``datenum`` and ``time2epoch`` because tree-sitter parsed both branches of the ``#ifndef CURL_DISABLE_PARSEDATE / #else / #endif`` guard and the stub body overwrote the real one, dropping ``parsedate``'s callees from the graph. After the parser fix the v3 run is fully clean.

github-actions · 2026-05-18T23:28:29Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

curl's ``UNITTEST CURLUcode parse_port(...)`` (where ``UNITTEST`` expands to ``static`` or empty) was mis-parsed: tree-sitter treated ``UNITTEST`` as the entire return type and stashed ``CURLUcode`` in a sibling ERROR node. The harness emitted ``UNITTEST result = parse_port(...);``, which after CPP became ``static result = ...`` and CBMC rejected with "expected constant expression". Mirror the GGML_RESTRICT param-name recovery for return types: when the parsed type looks like a storage-class macro and the function_definition has an ERROR sibling containing a single identifier, fold the macro into the type prefix and use the ERROR identifier as the actual return type. Also extend the macro-detection heuristic to cover single-word uppercase macros of ≥4 chars (UNITTEST, EXPORT, INLINE) in addition to the existing ALL_CAPS_WITH_UNDERSCORE shape. Regression: curl/urlapi.c run 2026-05-19, parse_port harness failed CBMC with the expected-constant error. Generalises to any C codebase using project-local linkage macros (OpenSSL ``OSSL_API``, kernel ``__init``/``__cold``, etc.). +2 tests; 528 passing. urlapi.c clean-verify writeup at findings/oss-curl/.

github-actions · 2026-05-18T23:36:18Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

CCC encoding.rs (Anthropic's Claude-written C compiler, 2.7k stars, 96% Rust) is the first real-world AI-generated Rust target we ran bmc-agent against. Two of 4 functions in the encoding utility module (bytes_to_string, encode_non_utf8) timed out at the default Kani slice_bound=4 / unwind=4 / timeout=120s because each input byte fans out into String::from_utf8 / std::str::from_utf8 internal state-space exploration. Verified experimentally that the same encode_non_utf8 harness regenerated with slice_bound=1, unwind=2 verifies clean in ~40 s. Changes: - KaniBackend.generate_harness gains an optional slice_bound_override kwarg. - KaniBackend.check gains unwind_override / timeout_override. - BMCEngine.check_function (Rust path only) detects "kani timed out", regenerates the harness with (slice_bound=2, unwind=4), then (slice_bound=1, unwind=2), saving each retry harness and re-checking. Stops as soon as a non-timeout verdict lands. Each retry overwrites the saved harness so the artifact reflects the configuration that actually produced the recorded verdict, keeping classification/realism stages downstream consistent with the counterexample they're reasoning about. +2 tests; 530 passing.

github-actions · 2026-05-19T05:51:59Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

github-actions · 2026-05-19T06:03:06Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

Two interlocking fixes that unblock standalone Kani verification on real-world Rust crates whose source files reference crate-local types. CCC const_arith.rs run 2026-05-19 — 3 selected primitive functions (wrap_result, unsigned_op, bool_to_i64) all failed Kani parse because the file's preamble pulled in two ``use crate::*;`` lines AND a dozen sibling fns referenced those imports in their signatures. 1. _strip_crate_local_use_statements Comment out any ``use crate::*``, ``use super::*``, ``use self::*`` (and ``pub use`` variants) in the included full source. std::, core::, alloc::, and external crate paths are untouched. 2. _strip_crate_local_fn_items Drop every top-level fn from the included source EXCEPT the target and any names in the recorded callee set. Sibling fns routinely reference the SAME types brought in by the now-stripped use lines (``op: &BinOp``) and would fail rustc parse with E0412. Tree-sitter-based — regex isn't reliable for brace matching in Rust (lifetimes, generics, raw strings). Module-level consts, type aliases, and external use statements stay intact. The harness now has: preamble + target fn + listed callees. +4 tests; 535 passing.

github-actions · 2026-05-19T06:12:42Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

…eout CCC source.rs v2 — 4 pure top-level fns all hit "loop unwind bound exhausted" at the default unwind=4 because each function has at least one inner loop that runs up to slice_bound (4) iterations PLUS some constant overhead. Kani's heuristic needs unwind ≥ loop-bound + 1; 4 is borderline and routinely fails. Add a separate retry path for unwind-exhausted: one shot at unwind=max(4×default, 16), keeping slice_bound intact. Distinct from the timeout-retry chain (which shrinks state). The chain now is: 1. Default (unwind=4, bound=4, 120 s) 2. On "unwind bound exhausted": bump unwind to 16 3. On "timed out": (separate path) shrink bound 2 → 1, tighten unwind CCC source.rs v3 with both retries: 3/4 fns verified clean, 1 inconclusive. v2 had 4/4 failed at parse/Kani. compute_line_offsets exercises BOTH retries — unwind-bump first lands a timeout, then shrink-retry verifies clean.

The fn-stripper from 9e31a9f covered top-level free fns but missed impl blocks. impl methods reference module-local sibling fns that the stripper just removed, so source files with impls cascade into E0425 ("cannot find function") even after the use-strip pass. Rust parser M1 scope only analyses top-level free fns, so any impl block in the file is by definition NOT the target. Strip every impl_item whole. Module-level structs, enums, type aliases, and consts stay intact. Regression: CCC source.rs 2026-05-19 — 4 pure top-level fns all failed Kani parse because impl Span / impl SourceManager bodies referenced compute_line_offsets / memchr_newline / FxHashMap after those fn defs were stripped. +1 test; 536 passing.

github-actions · 2026-05-19T06:19:13Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

github-actions · 2026-05-19T06:19:41Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

bmc-agent Rust path validation: first real-bug findings from the AI-generated CCC after this session's 5 Rust-pipeline fixes. 1. bytes_to_str (frontend/preprocessor/utils.rs:39) — slice-index panic when start > end or end > bytes.len(). pub fn with no documented precondition; CBMC trips core::slice::index::slice_index_fail. All ~9 in-tree callers guard the invariant in surrounding parser loops, so the panic is latent — but exposed as a pub API footgun. 2. skip_literal_bytes (utils.rs:46) — `let mut i = start + 1;` overflows when start == usize::MAX. Debug build: panic ("attempt to add with overflow", CBMC's assertion.1). Release build: wraps to 0 and the preprocessor silently advances from a wrong position. saturating_add or a precondition assert would fix. The other 2 fns in the file (copy_literal_bytes_raw, …_to_string) were blocked by an &mut-parameter limitation in the Kani harness generator — filed as next-up bmc-agent improvement.

github-actions · 2026-05-19T06:46:38Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

CCC preprocessor utils.rs had 2 fns blocked at harness gen by ``&mut references in Kani harnesses are not yet supported``: copy_literal_bytes_raw(... result: &mut Vec<u8>) and copy_literal_bytes_to_string(... result: &mut String). Both are byte-index helpers in the AI-generated CCC preprocessor with the same Rust panic shape as the 2 v1-flagged fns in the same file. Added init blocks for three concrete &mut shapes: - &mut Vec<T>: ``let mut _owned = Vec::new(); &mut _owned`` - &mut String: ``let mut _owned = String::new(); &mut _owned`` - &mut <primitive>: ``let mut _owned = kani::any(); &mut _owned`` Anything else (&mut UserStruct) still hits the explicit NotImplementedError — we don't try to nondet-init arbitrary user-defined structs. v2 prepu run after the fix: 4/4 bug-prone fns produced verdicts, 4 real Rust panics confirmed (slice OOB + usize overflow) across the file's 8 pub byte-index helpers. Writeup updated. +3 tests; 539 passing.

github-actions · 2026-05-19T06:50:21Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

The single-line regex caught ``use super::utils::{`` but stopped at the first newline, leaving the closing ``};`` and the wrapped identifier list as orphan code. Every harness for CCC macro_defs.rs failed rustc with "unexpected closing delimiter". Switch to MULTILINE | DOTALL so ``[^;]*`` spans newlines until the terminating ``;``, and wrap the whole match in a ``/* ... */`` block comment (single line comments cannot span the body of a multi-line use). Replace also escapes any embedded ``*/`` to avoid prematurely closing the wrapper. +1 test (multi-line case); +1 test rewritten for block-comment shape; 540 passing.

Crate-internal visibility restrictors fail standalone compilation with E0433 ("there are too many leading `super` keywords"). Replace pub(super), pub(crate), pub(self), and pub(in <path>) with plain pub. The harness loses the host-crate-specific scope restriction, but that restriction is meaningless in standalone compilation anyway and the type's accessibility to the harness is what we need. Regression: CCC macro_defs.rs 2026-05-19 — every struct field used pub(super) and rustc rejected every harness with E0433 before Kani saw any of them. +1 test; 541 passing. macro_defs.rs still hits a deeper cross-file-import problem (calls fns defined in utils.rs that won't resolve without cross-module stub generation) — separate follow-up.

github-actions · 2026-05-19T06:55:15Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

github-actions · 2026-05-19T06:56:33Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

…urvive sibling-strip The harness generator's sibling-strip step kept only DIRECT callees of the target fn (``func.callees``). Indirectly reachable helpers got stripped and the harness failed to compile. Concrete trigger: CCC src/backend/asm_expr.rs has a chain of mutually recursive descent-parser fns — eval_add -> eval_mul -> eval_unary -> eval_tokens -> eval_or -> eval_xor -> eval_and -> eval_shift -> eval_add. eval_add's direct callees are {Ok, eval_mul, tokens.len}, so eval_mul was kept but eval_unary (called by eval_mul) was stripped, breaking compilation with E0425 "cannot find function eval_unary". 11 of 12 fns in the file failed verification pre-verdict. Fix: compute transitive closure of callees against parsed_file.call_graph before passing keep_callees to the stripper. With this, all 8 eval_* helpers survive eval_add's harness; unrelated siblings (char_escape_value, tokenize_expr, parse_*) are still stripped. Tests: 3 new unit tests (closure walk, cycle termination, None fallback). 544 total tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The slice/Vec param-init paths used ``[T; N] = kani::any()``, which requires ``T: kani::Arbitrary``. Kani auto-derives Arbitrary for primitive types and tuples thereof, but NOT for user-defined enums/structs. Concrete trigger: CCC asm_expr.rs has fns taking ``&[ExprToken]`` where ``ExprToken`` is a local enum with an ``Op2(&'static str)`` variant. Trying to nondet-init the backing array fails with ``E0277 the trait bound ExprToken: kani::Arbitrary is not satisfied``. Surfaced AFTER the transitive-callee fix unblocked function-resolution. Fix: when the element type isn't a known-Arbitrary type (primitive), emit an empty Vec/slice backing instead. The function still gets a verdict — degenerate (only exercises the empty-input branch) but better than a build failure. Behaviour unchanged for primitive-element slices. Tests: 4 new unit tests (slice-of-user-type, &mut [T] of user-type, Vec<UserEnum>, sanity check that &[u8] still uses kani::any). 548 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t / spurious) Previously the validator collapsed two distinct cases into REAL_BUG: (a) panic reachable from an in-tree call site, and (b) panic reachable only via the public API surface — cargo-fuzz / future-caller risk, but no current call site produces the state. This over-reported the second case as "active crashes" when in fact every in-tree caller maintains the missing precondition via surrounding invariants (parser-loop invariants, ELF parse-bounds checks). The user (2026-05-19) asked for explicit separation: reachable bugs should be REAL_BUG; latent panics on public API should be LATENT, with a separate output stream so triage can pick severity tier. Implementation: - New ``CExOutcome.LATENT = "latent"`` enum value. - ``_is_structural_panic(failing_property, trace)`` — substring match against Rust/C panic markers (slice OOB, integer overflow, divide-by- zero, capacity_overflow, …). Rejects custom postcondition violations (those are LLM-spec artifacts, not pure panic sites). - ``_is_publicly_callable(func)`` — duck-typed against Rust ``is_pub`` and C ``not is_static``. - ``ValidationResult.is_latent_bug`` property + ``is_latent_bug`` key in ``to_dict()``. - ``_handle_spurious``: when refinement stabilises on a publicly- callable fn with a structural-panic CEx, upgrades SPURIOUS → LATENT instead of silently dropping the finding. - Entry-function branch (no in-scope callers): same upgrade rule. Previously called REAL_BUG with ``confirmed_system_entry``; this over-reported. Non-public entry points still go to REAL_BUG (conservative — kernel handlers etc.). - ``BugReporter.save_latent_report()`` + ``ArtifactStore.save_latent_ report()`` write to ``{driver}/{fn}/latent_report.json`` next to the existing ``bug_report.json``. - Pipeline routes LATENT outcomes in all 3 validate sites (Phase 3, Phase 3b recheck, Phase 3c CEGAR). Stashed on ``pipeline.latent_ reports`` for CLI access. - CLI ``verify`` summary prints separate ``Latent panics on pub API: N`` block with property + function name. Tests: 12 new unit tests covering the structural-panic substring matrix, public-API detection across Rust/C signatures, and the ValidationResult LATENT flag. Total: 560 passing (was 548). This is a methodology change, not a heuristic tweak. Existing real-bug-only consumers (eval scripts, baselines) are unchanged because ``is_real_bug`` still means ``outcome == REAL_BUG``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nic markers The LATENT marker list missed Rust stdlib panic helpers for failed ``unwrap()`` / ``expect()`` on Option/Result. The property name lands as ``std::result::unwrap_failed.assertion.N`` or ``core::option::expect_failed.assertion.N`` — both are pure panic sites reachable via cargo-fuzz / future-caller, so they should route to LATENT (not REAL_BUG) when the entry-function branch fires. Regression: CCC ``bytes_to_str`` and ``copy_literal_bytes_to_string`` in utils.rs v3 (2026-05-19) — both have unwrap() on ``str::from_utf8`` and both got REAL_BUG verdicts that should have been LATENT under the new classification (they're public-API fns with no in-tree caller producing the bad-UTF-8 state). Tests: +1 unit test (13 total in test_latent_classification). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous commit (c3864af) routed all pub-API-reachable structural panics to LATENT regardless of threat model. User feedback 2026-05-19: under threat_model='security' (the default), the attacker IS a current caller — adversarial inputs cross the public API boundary, so a panic reachable from the pub API IS a real bug, not "future-caller risk." The LATENT bucket only makes sense under non-security threat models (safety, functional) where we explicitly care about crashes in current in-tree call paths rather than worst-case input fuzzing. Behavior: - threat_model='security' (default): pub API + structural panic + no in-tree caller → REAL_BUG. Matches the cargo-fuzz standard the user originally set ("if a fuzzer can trigger it, it's a real bug"). The reasoning string explicitly mentions threat model so triage knows why the bug was promoted. - threat_model='safety' / 'functional': same conditions → LATENT. The hardening bucket for in-tree-unreachable but pub-API-reachable panics. - Non-public entries always go to REAL_BUG regardless of threat model. Applied at both LATENT decision points: the entry-function branch (no in-scope callers) and the _handle_spurious refinement-stabilised branch. Tests: +1 unit test verifying the Config.threat_model field is honored and defaults to 'security'. 562 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e ones User feedback 2026-05-19: BMC is more expressive than what the current spec generation uses it for. The DSL and prompts are entirely defensive — "what preconditions does this function need to be safe?" — which uses a fraction of what Kani/CBMC can prove. BMC can also verify functional correctness ("result == reference_computation"), algebraic identities ("align_up_64(v, a) % a == 0"), round-trip properties, structural invariants. We were leaving that power on the table. Phase 1: add an optional ``functional_spec`` field to the LLM response JSON. When the LLM can express what a CORRECT implementation would compute as a Rust/C boolean expression, it provides that spec; the parser AND-merges it into the postcondition so downstream harness gen, classification, refinement, and reporting consume it without any further changes. precondition: "data.len() >= offset + 2" ← unchanged (defensive) postcondition: "result <= u16::MAX" ← unchanged (defensive) functional_spec: "result == ((data[offset+1] as u16) << 8) | (data[offset] as u16)" ← NEW => merged post: "(result <= u16::MAX) && (result == ((data[offset+1] as u16) << 8) | (data[offset] as u16))" Behaviour: - Old prompts without ``functional_spec`` parse identically (back-compat). - Empty/trivial values ("", "true", "1", "n/a", "none") drop cleanly — no "(post) && (true)" noise. - If the defensive post is itself "true", the functional spec replaces it entirely instead of wrapping. - Both ENTRY_SPEC_PROMPT and INTERNAL_SPEC_PROMPT now ask for the optional field with examples (reference equivalence, algebraic laws, round-trip, structural). - Prompts explicitly warn against fabricating functional specs — wrong spec produces false positives — and explicitly say leave empty when not expressible (e.g. parsers returning structured ASTs). Expected effect on CCC sweep: - Byte-helpers: spec asserts "result == manual LE decode"; Kani still finds OOB panics (same as before) AND verifies the bit operations. - align_up_64: spec captures the alignment invariant; Kani may find arithmetic bugs that the defensive spec missed. - Wrappers like bytes_to_str: spec compares to std::str::from_utf8; Kani can prove the wrapper is faithful. This unlocks BMC for functional-correctness verification within the existing architecture. No new dataclasses, no new harness paths. Tests: 6 new unit tests covering missing/empty/trivial/real functional spec parsing. 568 passing total (was 562). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… by default Validation on linker_common/hash.rs + write.rs revealed the LLM was declining to emit functional_spec for every function — pre/post stayed "true" with the original prompt. Hand-injecting a functional spec on align_up_64 ((result >= val) && (result - val < align) && (result % align == 0)) successfully verified functional correctness, proving the wiring works. The LLM's reticence was the only gap. Revised prompt: - Reframe ``functional_spec`` from "optional" to "STRONGLY ENCOURAGED" — the LLM was reading the optional framing as "skip if unsure" by default. - Add 5 concrete spec templates with real Rust syntax: byte readers (u16::from_le_bytes equivalence), alignment (algebraic identity), hashing (fold expression), wrappers (round-trip), bit ops (direct formula). - Note tautological specs are CORRECT and should be emitted, not skipped as obvious — they catch interference / bugs in the function body even when the reference computation mirrors it. - Keep the anti-fabrication rule but reorder: the message is now "emit by default; skip only when genuinely impossible," not "skip by default; emit only when confident." Expected effect on next CCC sweep: more functional specs in spec.json, more verified-clean runs with strengthened postconditions, and potentially new bugs caught by Kani when the implementation diverges from the asserted reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…prompts The dual-spec path (enable_dual_spec=True default) uses CALLER_HEAVY_SPEC_PROMPT and IMPL_HEAVY_SPEC_PROMPT, not the ENTRY_SPEC_PROMPT that previous commits modified. So Phase 1 functional specs were never being generated in practice — the pipeline routes through dual-spec which still asked for only pre/post. Validated by direct LLM query: with the original ENTRY_SPEC_PROMPT the model emits a perfect functional spec (e.g. "result == name.iter().fold(5381u32, |h, &b| h.wrapping_mul(33).wrapping_add(b as u32))" for gnu_hash). In the pipeline the dual-spec path stripped that field because the prompts didn't declare it. Both dual-spec prompts now request functional_spec with the same template-set as ENTRY_SPEC_PROMPT (byte readers, hash folds, alignment identities, wrappers). Tautological-reference framing preserved — the LLM should emit the reference computation even when it mirrors the body. 568 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Live validation on linker_common/write.rs (with API key + functional spec landed) revealed a regression: the LLM emits rich functional specs for state-mutating fns using ``old(buf.len())`` to reference pre-call state. Kani's plain ``kani::assert`` can't see pre-state, and the harness generator doesn't snapshot mutable inputs before the call, so the harness fails to compile with ``E0425: cannot find function 'old'`` and the function is SKIPPED — neither REAL_BUG nor LATENT. Concrete fallout: write.rs went from 3 bugs (align_up_64, pad_to, write_elf64_phdr_at) to 1 bug + 2 skipped harnesses, because the functional spec contained ``old()``. Quick fix: detect ``old\\(`` in the functional spec and drop the spec entirely. Loses some precision (the post stays at the defensive "true") but harnesses compile and the function gets a real verdict. Proper fix (deferred): snapshot mutable inputs in the harness (``let _pre_buf = buf.clone();``) and substitute ``old(X)`` → ``_pre_X`` before passing to Kani. Then state-mutating fns get their full functional specs verified. The regex ``\\bold\\s*\\(`` is word-bounded so it doesn't match identifiers like ``old_threshold`` or ``old_var``. Tests: +3 unit tests (with-old / nested-old / without-old). 571 total (was 568). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 1's previous quick-fix blanket-dropped functional specs containing old() because Kani's plain kani::assert can't see pre-call state. That suppressed useful specs on every state-mutating function (pad_to, write_elf64_phdr, write_elf64_shdr, codegen passes — anywhere the LLM needs to compare pre vs post buffer state). This commit adds the proper substitution: - New helper ``_extract_old_snapshots(postcondition)`` finds each ``old(EXPR)`` call with paren-balanced extraction (handles nesting), generates a snapshot binding ``let _pre_N = (EXPR);``, and rewrites the postcondition to use ``_pre_N`` in place of ``old(EXPR)``. - Slice-indexing expressions (containing ``[``) get ``.to_vec()`` so the snapshot owns the data and survives the about-to-be-mutated borrow. Scalar exprs like ``buf.len()`` stay as-is (assume Copy). - Nested ``old(...)`` is handled: the inner old() is stripped during snapshot generation because at snapshot point everything is already pre-state. So ``old(buf[..old(buf.len())])`` becomes ``let _pre_0 = (buf[..buf.len()]).to_vec();`` and the post substitutes the outer call with ``_pre_0``. - Snapshots are emitted in the harness BETWEEN the precondition assumption and the function call, so they capture in-spec pre-state. - Identifier lookbehind: ``cold(`` and ``_old(`` are not matched — only the ``old(`` form, optionally preceded by whitespace or non-identifier characters. The earlier ``_parse_llm_spec_response`` drop logic is removed since the backend now handles old() natively. Updated 3 tests in test_functional_spec.py from "spec dropped" → "spec preserved"; added 5 new tests in test_kani.py covering the snapshot helper (no-old passthrough, scalar snapshot, slice to_vec, nested old strip, identifier guard). Tests: 576 passing (was 571, +5 new). Expected effect on next write.rs run: pad_to, write_elf64_phdr, write_elf64_shdr no longer SKIPPED for E0425; their functional specs verify against the implementation alongside the defensive checks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…positives Phase 1 validation on CCC long_double.rs surfaced a systematic Phase 1 weakness: when the LLM-generated functional_spec contains plain ``+``/``-``/``*`` arithmetic on the function's inputs, Kani evaluates the spec on adversarial nondet values (``usize::MAX``, ``i64::MIN``, etc.) and the SPEC itself overflows. The pipeline classifies that as ``check_<fn>.assertion.N: attempt to add with overflow`` — a CEX in the harness wrapper, not in the function body — and over-reports it as a REAL_BUG. Concrete fallout: long_double.rs reported 4 REAL_BUG (f64_decompose, make_x87_infinity, shift_right_256_with_grs, shifted_limb) — all spec-evaluation overflow, no actual bug in the function bodies. inline_asm.rs reported 1 REAL_BUG (is_generic_gp_constraint) — also spec fabrication (over-simple functional spec). Two-pronged fix: 1. Prompt update (prompts.py) — add explicit "SPEC-EVALUATION OVERFLOW" warning to the entry-spec prompt. Tells the LLM to use ``wrapping_add``/``wrapping_mul`` in functional specs, or guard the precondition to exclude the overflowing inputs. With concrete wrong / right examples. 2. Classifier filter (cex_validator._witness_obvious_artifact) — when the failing property starts with ``check_`` (harness wrapper) AND the trace mentions ``attempt to {add,sub,mul,shift} with overflow``, downgrade to model-artifact (spec-evaluation overflow), not REAL_BUG. This catches false positives even when the LLM ignored the prompt warning. The filter is precise: body-level overflow (``<fn>.assertion.N`` without the check_ prefix) still routes through the REAL_BUG / LATENT classification normally. Postcondition violations with non-overflow traces also pass through as real bugs (those are genuine spec/impl mismatches, like the align_up misalignment finding). Tests: 7 new unit tests in test_spec_overflow_filter.py covering add/sub/mul/shift overflow filtering, body-level overflow not filtered, postcondition violations not filtered, check_*-named fns edge case. 583 passing (was 576, +7 new). This makes Phase 1 noticeably more conservative in practice: it stays silent on inputs the LLM should have guarded, instead of producing false-positive bug reports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Comprehensive writeup of the session: 25 real bugs found in CCC (24 defensive panic-class + 1 functional semantic), 4+ functional correctness proofs (hash.rs full djb2/SysV equivalence, alignment invariants on align_up_64, ELF byte-layout on write_elf64_phdr/shdr), 5 Phase 1 false positives (inline_asm + 4 long_double). Includes: full bug table organized by file + function class, list of 13 bmc-agent improvements landed this session (with commit hashes), what works well vs what doesn't, threat-model framing clarification, and the methodology observation about caller-guards-match-inferred-pre. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… positives Re-running long_double.rs with --enable-realism-check confirmed the hypothesis: all 3 Phase 1 finds (f64_decompose, shift_right_256_with_grs, make_x87_infinity) were verdicted UNREALISTIC with high confidence by the realism check. CLI suppression dropped raw count 3 → final count 0. Concrete realism reasoning examples: - f64_decompose: "CBMC assertion violation with no counterexample witness values recorded" - shift_right_256_with_grs: "verification artifact from CBMC's symbolic execution, not a real bug" - make_x87_infinity: identified the x87 byte-layout convention mismatch correctly Updated the session summary with: - Validated mitigation (realism-check works, not just theoretical) - Recommended invocation for future CCC runs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-19T11:44:28Z

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)

[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

theyoucheng and others added 3 commits May 18, 2026 21:05

theyoucheng and others added 2 commits May 18, 2026 21:29

theyoucheng added 4 commits May 19, 2026 02:52

theyoucheng added 2 commits May 19, 2026 03:25

theyoucheng added 2 commits May 19, 2026 10:17

theyoucheng added 2 commits May 19, 2026 10:53

theyoucheng and others added 13 commits May 19, 2026 11:25

theyoucheng merged commit a3e6bf5 into main May 19, 2026
1 check passed

Uh oh!

Conversation

theyoucheng commented May 18, 2026

Self-test for bmc-agent-pr action

Uh oh!

github-actions Bot commented May 18, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 18, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 18, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 18, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 18, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 18, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 18, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 18, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 18, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 18, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 18, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 19, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 19, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 19, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 19, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 19, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 19, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 19, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 19, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 19, 2026

bmc-agent results

Uh oh!

github-actions Bot commented May 19, 2026

bmc-agent results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant