Skip to content

selftest: remove bounds check in safe_double#1

Merged
theyoucheng merged 39 commits into
mainfrom
test/bmc-action-selftest-1
May 19, 2026
Merged

selftest: remove bounds check in safe_double#1
theyoucheng merged 39 commits into
mainfrom
test/bmc-action-selftest-1

Conversation

@theyoucheng

Copy link
Copy Markdown
Contributor

Self-test for bmc-agent-pr action

This PR removes the bounds check in examples/bmc-action-selftest/safe_double.c to trigger the bmc-agent self-test workflow.

Expected behaviour:

  • The bmc-agent self-test workflow fires on this PR.
  • The action preprocesses the changed file, runs bmc-agent verify with --enable-flag-selection.
  • The flag-selector picks --signed-overflow-check (and/or --unsigned-overflow-check).
  • CBMC surfaces safe_double.overflow.1 — signed integer overflow on x * 2 for large x.
  • A comment is posted on this PR summarising the finding.

Validation goal: confirm the action runs in an actual GitHub Actions runner (not just locally in Docker as in commit 327f628).

theyoucheng and others added 3 commits May 18, 2026 21:05
Triggers the bmc-agent self-test workflow. Expected outcome: a PR
comment from the action reporting REAL_BUG on safe_double, property
safe_double.overflow.1, root cause "signed integer overflow on
x * 2 for x near INT_MAX".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
action.yml cannot use ${{ github.token }} expressions in its env block
— that context only resolves inside workflow files. Trying to do so
fails the runner with "Unrecognized named-value: 'github'" at action
load time (live-runner test 2026-05-18).

Fix: declare github-token as an explicit input on the action, mapped
to GITHUB_TOKEN in the env block via ${{ inputs.github-token }}. The
calling workflow now passes ${{ secrets.GITHUB_TOKEN }} (or a PAT)
through that input.

Updated:
  action.yml — new github-token input.
  .github/workflows/bmc-agent-selftest.yml — pass github-token.
  actions/bmc-agent-pr/example-workflow.yml — same.
  actions/bmc-agent-pr/README.md — document the input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous fix replaced the env-block expression but left a stray
${{ github.token }} inside the description string of the github-token
input. GitHub Actions evaluates expressions in descriptions too, so
the runner still failed with 'Unrecognized named-value: github' at
load time. Rephrase as plain `github.token` code-tick to avoid the
${{ sequence entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

bmc-agent results

No real_bug verdicts across 1 changed file(s).

All verdicts (\(0))
[]

Two changes that turn the action's previous black-box failure mode
into a debuggable one:

1. entrypoint.sh now dumps a 40-line tail of the bmc-agent log even
   on rc=0. The first live-runner self-test (run 26048828588)
   reported 0 real_bug findings in 1s — way too fast for any real
   LLM call. With rc=0 the previous entrypoint suppressed the log
   entirely, making silent-failure indistinguishable from clean run.

2. Workflow uploads .bmc-agent-artifacts/ via actions/upload-artifact
   so the per-function spec/harness/CEX/log directory survives the
   ephemeral runner for offline triage.

Also fixes a cosmetic markdown bug in the PR comment: literal \(0)
appeared where the verdict count should have been due to over-eager
backslash-escaping inside the heredoc; now uses an explicit shell
variable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

bmc-agent results

No real_bug verdicts across 1 changed file(s).

All verdicts (0)
[]

Self-test runs report LLM 401 Missing Authentication on every Phase 1
spec generation, even with secrets.BMC_AGENT_LLM_KEY supplied via
the action input. Add a diagnostics group to the entrypoint that
prints presence and length (NOT the value) of ANTHROPIC_API_KEY,
BMC_AGENT_LLM_BASE_URL, BMC_AGENT_LLM_MODEL, and GITHUB_TOKEN.
The value-prefix in the log is just three chars (e.g. 'sk-') so
secret-redaction continues to mask the rest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

bmc-agent results

No real_bug verdicts across 1 changed file(s).

All verdicts (0)
[]

theyoucheng and others added 2 commits May 18, 2026 21:29
The previous secret upload (gh secret set --body -) somehow stored
only 1 char of the key; the action's diagnostic group showed
ANTHROPIC_API_KEY length=1 inside the container. Re-uploaded the
73-byte OpenRouter key via stdin redirect (gh secret set < /tmp/key)
which preserves the full value. This empty commit triggers a fresh
workflow run that should now see the full key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 r8125_fiber hunt surfaced rtl8125_check_fiber_mode_support
as a candidate REAL_BUG via property overflow.1 — CBMC's
--conversion-check flagged the u32->u8 cast in
  u8 tmp = (u8)rtl8125_mac_ocp_read(tp, 0xD006);
even though the cast is explicit driver-IO code taking the low byte
of a 32-bit register read. This is intentional, defined C behavior,
not a bug.

Add a realism-stage witness detector that recognizes the pattern:
CBMC description matches 'arithmetic overflow on <kind> to <kind>
type conversion in (T)expr' AND target type T is in a narrow-int
allowlist (u8/u16/u32, __u8/__u16/__u32, uint8_t/uint16_t/uint32_t,
char/short, etc.). Detector returns UNREALISTIC without an LLM call,
matching the pattern used by the USB-serial, PHY, and netdev-private
detectors.

Conservative — we don't fire on user-defined narrowing target types
where unintended narrowing might be the actual bug.

Also includes the MMIO-region-size harness fix landed (accidentally)
in 00da727 — void *mmio_addr now gets a 4 KiB backing region instead
of the 1-byte __CPROVER_r_ok(p, sizeof(void)) that produced spurious
OOB on legitimate register accesses at offsets > 1 (r8125_dash FP).

+8 regression tests; 498 passing total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

1 similar comment
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

llama.cpp ggml-alloc.c sweep (2026-05-18) found that every one of 87
functions failed CBMC parsing with 'redefinition of body of struct
_IO_FILE' (exit code 6) — the preprocessed source contains the full
glibc internal struct body, and CBMC's own libc redefines it when the
harness references FILE/pthread_mutex_t/etc.

Add _strip_glibc_internal_struct_bodies pass that converts
  struct _IO_FILE { ... };
into
  struct _IO_FILE; /* glibc-internal body stripped */
while leaving project-defined structs alone. The body strip:
  - matches names starting with _IO_, __, or _G_ (glibc-internal
    convention; kernel uses __kernel_* etc. but kernel-mode is a
    no-op so those survive);
  - walks brace depth correctly through nested unions / anonymous
    structs / arrays (struct __pthread_mutex_s has nested members);
  - is comment- and string-literal-aware so it doesn't trip on
    struct-keyword occurrences inside doc text.

Wired into the harness pipeline alongside _strip_glibc_internal_typedefs
in two places (real-libc dynamic harness path + CBMC verification
harness path).

+4 regression tests; 502 passing total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

…list

Re-run of ggml-alloc.c with the prefix-only strip (commit 80e3b3f)
still produced 95 CBMC parse errors — glibc structs without _IO_/__/_G_
prefix survived (timeval, timespec, random_data, drand48_data).

Add an allowlist of known POSIX/glibc struct names that CBMC's
built-in libc redefines: timeval/timespec/itimerval/itimerspec, tm,
stat/stat64, sockaddr family + msghdr/iovec, addrinfo/hostent,
lconv, rusage/rlimit, passwd/group, dirent/dirent64,
sigaction/siginfo_t, flock, termios, utsname, random_data/drand48_data,
sched_param, fpos_t. List grown empirically from observed
'redefinition of body of struct X' errors; expand as more surface.

+1 regression test (503 total).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

First real-OSS sweep result from the autonomous overnight work. 87
functions in ggml/src/ggml-alloc.c; 22 completed a full BMC + Phase 3
cycle and verified clean of any real bug; 26 hit CBMC parse errors
(struct/typedef redefinition conflicts even with --real-libc); 0
confirmed real bugs.

Most informative candidate: aligned_offset's offset+align unsigned
overflow, surfaced by --unsigned-overflow-check, then iteratively
constrained-out by the feedback loop after it learned that all
callers bound offset by max_buffer_size. Demonstrates the realism +
feedback chain distinguishing theoretical-overflow from
exploitable-overflow on a recognized AI-inference codebase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

Two issues uncovered by the ggml.c run (all 439 functions failed CBMC
parse/convert; 0 verdicts produced but the pipeline still emitted
"AMC Pipeline END: 0 real bug(s) found" — ambiguous between
clean-verify and total parse failure).

1. Coverage diagnostics. AMCPipeline.run() now scans every saved
   cbmc_result.json after Phase 4 and aggregates failure modes:
   undefined symbols (build-config macros like GGML_VERSION), bad
   assert arity, struct-syntax errors, and other parse/conv errors.
   Writes coverage_diagnostics.json into the driver dir and logs a
   single ::warning block when ≥1 function failed pre-verdict. When
   ≥50% of functions failed, escalates to "treat this run as BLOCKED,
   not as a clean verify."

2. owns(scope, ptr) DSL form. The existing _OWNS_RE matched the
   2-arg form with group(1)="ctx, a" and produced
   `assert(ctx, result != NULL)` which CBMC rejects as a 2-arg
   assert. Two-group regex now drops the scope and keeps only the
   pointer. Affected ~34 ggml.c harnesses in the just-completed run.

ggml.c failure-mode writeup at findings/oss-llamacpp/. ggml-quants.c
re-run in flight uses the fixed code path.

+7 tests; 510 passing.
Two issues uncovered by the ggml-quants.c run: 3 confirmed_dynamic
SIGSEGV findings that were all the same FP class, plus a tree-sitter
misparse that masked struct return types in ggml.c.

1. Address-taken detector in cex_validator. The classifier promoted
   iq1_sort_helper / iq2_compare_func / iq3_compare_func to real_bug
   because _find_callers() returned empty (the only "callers" are
   qsort(..., fn) which the call graph misses). The dynamic
   validator then ran them with NULL pointers and got SIGSEGV.
   New _is_address_taken() scans the source (with comments and
   string literals neutralised) for any reference to the function
   name not followed by '('. When found, the no-callers path now
   returns UNRESOLVED instead of REAL_BUG, with reasoning citing
   the library-callback indirection. Generalises to qsort/bsearch/
   pthread_create/atexit/signal and struct function-pointer tables.

2. Tree-sitter recovers struct keyword through macro prefix. For
   "GGML_API struct ggml_tensor * fn(...)", tree-sitter consumes
   "GGML_API struct" as a stray declaration and the function's
   type field becomes the bare tag "ggml_tensor *". The harness
   then writes "ggml_tensor *result = ..." which CBMC rejects
   because ggml_tensor is not a typedef. Parser now scans backwards
   from the type node's start byte, over whitespace only and stopping
   at any statement separator, and recovers struct/union/enum if
   present. Affected 2 functions in the ggml.c run; the pattern
   hits any C library that uses an export macro plus tag types.

ggml-quants clean-verify writeup at findings/oss-llamacpp/.
+9 tests; 519 passing.
ggml-cpu/quants.c run blocked at 43/43 with ``syntax error before '=''``
because every signature uses ``T * GGML_RESTRICT s`` and tree-sitter
parsed ``GGML_RESTRICT`` as the parameter name, dropping the real ``s``
into a sibling ERROR node. The harness then declared three local
variables all named ``GGML_RESTRICT`` and CBMC rejected the redefinition.

When (a) the parsed parameter name looks like a qualifier macro
(ALL_CAPS-with-underscore, or leading ``__``) and (b) the next sibling
in the parameter list is an ERROR node containing a single identifier,
fold the macro into the type string and use the ERROR identifier as
the parameter name. Conservative — single-letter caps (no underscore,
no leading ``__``) are not treated as macros.

Generalises beyond ggml: same pattern hits any C codebase using
__restrict__, __nonnull__, project-local NORETURN/HOT/COLD attribute
macros, etc.

+2 tests; 521 passing.
ggml-cpu/quants.c run produced 7 confirmed_dynamic findings on the
quantize_row_* family, all of the form:

    void quantize_row_q5_K(... , int64_t k) {
        assert(k % QK_K == 0);   // <-- fires under CBMC
        ...
    }

QK_K is a #define'd block-size constant (256). The source-assert
extractor already promotes ``assert(expr)`` at the top of a function
to ``__CPROVER_assume(expr)`` when ``expr`` only references the
function's parameters and a small allowlist of stdint maxes — but
QK_K wasn't on the allowlist, so the assert wasn't promoted and
CBMC explored k=257 and friends, tripping the source assert.

Allow any ALL_CAPS_WITH_UNDERSCORE identifier as a stable
compile-time constant (the conventional C name shape for #defines
and enum members). Two negative tests guard against lowercase free
identifiers and __builtin_/_Static_ names being swallowed.

Generalises beyond ggml: PAGE_SIZE, L1_CACHE_BYTES, BLOCK_SIZE,
MAX_USERS, etc. — the precondition-assert idiom is ubiquitous in
Linux kernel modules and embedded C codebases.

ggml-cpu/quants.c clean-verify writeup at findings/oss-llamacpp/.
+3 tests; 524 passing.
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

C source files commonly define a function twice under a preprocessor
``#ifdef X /* real */ #else /* stub */ #endif``. Tree-sitter parses
both branches because it doesn't process the preprocessor; without
disambiguation, the SECOND (stub) definition overwrites the first
(real) one in our functions/call_graph/function_bodies maps. The
real body's callees vanish, so any function the real body called
gets mis-classified as a system-entry point.

Regression: curl/parsedate.c run 2026-05-19 reported ``datenum`` and
``time2epoch`` as confirmed_system_entry findings because the 3-stmt
``#else`` stub for ``parsedate`` overwrote the 30-stmt real body
and dropped both functions from the call graph.

Now: when a name is seen a second time, keep the entry with the
longer body. Stubs are short (``{ (void)x; return 0; }``); real
implementations are multi-statement. First-wins on ties so the rule
is deterministic in file order.

+2 tests; 526 passing.
17/17 functions verified after the duplicate-definition fix. The
v2 run produced 2 confirmed_system_entry false positives on
``datenum`` and ``time2epoch`` because tree-sitter parsed both
branches of the ``#ifndef CURL_DISABLE_PARSEDATE / #else / #endif``
guard and the stub body overwrote the real one, dropping
``parsedate``'s callees from the graph. After the parser fix the
v3 run is fully clean.
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

curl's ``UNITTEST CURLUcode parse_port(...)`` (where ``UNITTEST``
expands to ``static`` or empty) was mis-parsed: tree-sitter treated
``UNITTEST`` as the entire return type and stashed ``CURLUcode`` in
a sibling ERROR node. The harness emitted ``UNITTEST result =
parse_port(...);``, which after CPP became ``static result = ...``
and CBMC rejected with "expected constant expression".

Mirror the GGML_RESTRICT param-name recovery for return types:
when the parsed type looks like a storage-class macro and the
function_definition has an ERROR sibling containing a single
identifier, fold the macro into the type prefix and use the ERROR
identifier as the actual return type. Also extend the macro-detection
heuristic to cover single-word uppercase macros of ≥4 chars
(UNITTEST, EXPORT, INLINE) in addition to the existing
ALL_CAPS_WITH_UNDERSCORE shape.

Regression: curl/urlapi.c run 2026-05-19, parse_port harness failed
CBMC with the expected-constant error. Generalises to any C codebase
using project-local linkage macros (OpenSSL ``OSSL_API``, kernel
``__init``/``__cold``, etc.).

+2 tests; 528 passing.

urlapi.c clean-verify writeup at findings/oss-curl/.
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

CCC encoding.rs (Anthropic's Claude-written C compiler, 2.7k stars,
96% Rust) is the first real-world AI-generated Rust target we ran
bmc-agent against. Two of 4 functions in the encoding utility module
(bytes_to_string, encode_non_utf8) timed out at the default Kani
slice_bound=4 / unwind=4 / timeout=120s because each input byte
fans out into String::from_utf8 / std::str::from_utf8 internal
state-space exploration. Verified experimentally that the same
encode_non_utf8 harness regenerated with slice_bound=1, unwind=2
verifies clean in ~40 s.

Changes:
- KaniBackend.generate_harness gains an optional
  slice_bound_override kwarg.
- KaniBackend.check gains unwind_override / timeout_override.
- BMCEngine.check_function (Rust path only) detects "kani timed out",
  regenerates the harness with (slice_bound=2, unwind=4), then
  (slice_bound=1, unwind=2), saving each retry harness and re-checking.
  Stops as soon as a non-timeout verdict lands.

Each retry overwrites the saved harness so the artifact reflects the
configuration that actually produced the recorded verdict, keeping
classification/realism stages downstream consistent with the
counterexample they're reasoning about.

+2 tests; 530 passing.
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

Two interlocking fixes that unblock standalone Kani verification on
real-world Rust crates whose source files reference crate-local
types. CCC const_arith.rs run 2026-05-19 — 3 selected primitive
functions (wrap_result, unsigned_op, bool_to_i64) all failed Kani
parse because the file's preamble pulled in two ``use crate::*;``
lines AND a dozen sibling fns referenced those imports in their
signatures.

1. _strip_crate_local_use_statements
   Comment out any ``use crate::*``, ``use super::*``, ``use self::*``
   (and ``pub use`` variants) in the included full source. std::,
   core::, alloc::, and external crate paths are untouched.

2. _strip_crate_local_fn_items
   Drop every top-level fn from the included source EXCEPT the
   target and any names in the recorded callee set. Sibling fns
   routinely reference the SAME types brought in by the now-stripped
   use lines (``op: &BinOp``) and would fail rustc parse with E0412.
   Tree-sitter-based — regex isn't reliable for brace matching in
   Rust (lifetimes, generics, raw strings).

Module-level consts, type aliases, and external use statements stay
intact. The harness now has: preamble + target fn + listed callees.

+4 tests; 535 passing.
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

…eout

CCC source.rs v2 — 4 pure top-level fns all hit
"loop unwind bound exhausted" at the default unwind=4 because each
function has at least one inner loop that runs up to slice_bound (4)
iterations PLUS some constant overhead. Kani's heuristic needs
unwind ≥ loop-bound + 1; 4 is borderline and routinely fails.

Add a separate retry path for unwind-exhausted: one shot at
unwind=max(4×default, 16), keeping slice_bound intact. Distinct
from the timeout-retry chain (which shrinks state). The chain now is:
  1. Default (unwind=4, bound=4, 120 s)
  2. On "unwind bound exhausted": bump unwind to 16
  3. On "timed out": (separate path) shrink bound 2 → 1, tighten unwind

CCC source.rs v3 with both retries: 3/4 fns verified clean, 1
inconclusive. v2 had 4/4 failed at parse/Kani. compute_line_offsets
exercises BOTH retries — unwind-bump first lands a timeout, then
shrink-retry verifies clean.
The fn-stripper from 9e31a9f covered top-level free fns but missed
impl blocks. impl methods reference module-local sibling fns that
the stripper just removed, so source files with impls cascade into
E0425 ("cannot find function") even after the use-strip pass.

Rust parser M1 scope only analyses top-level free fns, so any impl
block in the file is by definition NOT the target. Strip every
impl_item whole. Module-level structs, enums, type aliases, and
consts stay intact.

Regression: CCC source.rs 2026-05-19 — 4 pure top-level fns all
failed Kani parse because impl Span / impl SourceManager bodies
referenced compute_line_offsets / memchr_newline / FxHashMap after
those fn defs were stripped.

+1 test; 536 passing.
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

1 similar comment
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

bmc-agent Rust path validation: first real-bug findings from the
AI-generated CCC after this session's 5 Rust-pipeline fixes.

1. bytes_to_str (frontend/preprocessor/utils.rs:39) — slice-index
   panic when start > end or end > bytes.len(). pub fn with no
   documented precondition; CBMC trips
   core::slice::index::slice_index_fail. All ~9 in-tree callers
   guard the invariant in surrounding parser loops, so the panic
   is latent — but exposed as a pub API footgun.

2. skip_literal_bytes (utils.rs:46) — `let mut i = start + 1;`
   overflows when start == usize::MAX. Debug build: panic
   ("attempt to add with overflow", CBMC's assertion.1). Release
   build: wraps to 0 and the preprocessor silently advances
   from a wrong position. saturating_add or a precondition assert
   would fix.

The other 2 fns in the file (copy_literal_bytes_raw, …_to_string)
were blocked by an &mut-parameter limitation in the Kani harness
generator — filed as next-up bmc-agent improvement.
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

CCC preprocessor utils.rs had 2 fns blocked at harness gen by
``&mut references in Kani harnesses are not yet supported``:
copy_literal_bytes_raw(... result: &mut Vec<u8>) and
copy_literal_bytes_to_string(... result: &mut String). Both are
byte-index helpers in the AI-generated CCC preprocessor with the
same Rust panic shape as the 2 v1-flagged fns in the same file.

Added init blocks for three concrete &mut shapes:
- &mut Vec<T>: ``let mut _owned = Vec::new(); &mut _owned``
- &mut String: ``let mut _owned = String::new(); &mut _owned``
- &mut <primitive>: ``let mut _owned = kani::any(); &mut _owned``

Anything else (&mut UserStruct) still hits the explicit
NotImplementedError — we don't try to nondet-init arbitrary
user-defined structs.

v2 prepu run after the fix: 4/4 bug-prone fns produced verdicts,
4 real Rust panics confirmed (slice OOB + usize overflow) across
the file's 8 pub byte-index helpers. Writeup updated.

+3 tests; 539 passing.
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

The single-line regex caught ``use super::utils::{`` but stopped at
the first newline, leaving the closing ``};`` and the wrapped
identifier list as orphan code. Every harness for CCC macro_defs.rs
failed rustc with "unexpected closing delimiter".

Switch to MULTILINE | DOTALL so ``[^;]*`` spans newlines until the
terminating ``;``, and wrap the whole match in a ``/* ... */`` block
comment (single line comments cannot span the body of a multi-line
use). Replace also escapes any embedded ``*/`` to avoid prematurely
closing the wrapper.

+1 test (multi-line case); +1 test rewritten for block-comment shape;
540 passing.
Crate-internal visibility restrictors fail standalone compilation
with E0433 ("there are too many leading `super` keywords").
Replace pub(super), pub(crate), pub(self), and pub(in <path>) with
plain pub. The harness loses the host-crate-specific scope
restriction, but that restriction is meaningless in standalone
compilation anyway and the type's accessibility to the harness is
what we need.

Regression: CCC macro_defs.rs 2026-05-19 — every struct field
used pub(super) and rustc rejected every harness with E0433
before Kani saw any of them.

+1 test; 541 passing. macro_defs.rs still hits a deeper
cross-file-import problem (calls fns defined in utils.rs that
won't resolve without cross-module stub generation) — separate
follow-up.
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

1 similar comment
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

theyoucheng and others added 13 commits May 19, 2026 11:25
…urvive sibling-strip

The harness generator's sibling-strip step kept only DIRECT callees of
the target fn (``func.callees``). Indirectly reachable helpers got
stripped and the harness failed to compile.

Concrete trigger: CCC src/backend/asm_expr.rs has a chain of mutually
recursive descent-parser fns — eval_add -> eval_mul -> eval_unary ->
eval_tokens -> eval_or -> eval_xor -> eval_and -> eval_shift -> eval_add.
eval_add's direct callees are {Ok, eval_mul, tokens.len}, so eval_mul
was kept but eval_unary (called by eval_mul) was stripped, breaking
compilation with E0425 "cannot find function eval_unary". 11 of 12
fns in the file failed verification pre-verdict.

Fix: compute transitive closure of callees against parsed_file.call_graph
before passing keep_callees to the stripper. With this, all 8 eval_*
helpers survive eval_add's harness; unrelated siblings
(char_escape_value, tokenize_expr, parse_*) are still stripped.

Tests: 3 new unit tests (closure walk, cycle termination, None fallback).
544 total tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The slice/Vec param-init paths used ``[T; N] = kani::any()``, which
requires ``T: kani::Arbitrary``. Kani auto-derives Arbitrary for primitive
types and tuples thereof, but NOT for user-defined enums/structs.

Concrete trigger: CCC asm_expr.rs has fns taking ``&[ExprToken]`` where
``ExprToken`` is a local enum with an ``Op2(&'static str)`` variant.
Trying to nondet-init the backing array fails with
``E0277 the trait bound ExprToken: kani::Arbitrary is not satisfied``.
Surfaced AFTER the transitive-callee fix unblocked function-resolution.

Fix: when the element type isn't a known-Arbitrary type (primitive),
emit an empty Vec/slice backing instead. The function still gets a
verdict — degenerate (only exercises the empty-input branch) but better
than a build failure. Behaviour unchanged for primitive-element slices.

Tests: 4 new unit tests (slice-of-user-type, &mut [T] of user-type,
Vec<UserEnum>, sanity check that &[u8] still uses kani::any). 548 tests
pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t / spurious)

Previously the validator collapsed two distinct cases into REAL_BUG:
(a) panic reachable from an in-tree call site, and (b) panic reachable
only via the public API surface — cargo-fuzz / future-caller risk, but
no current call site produces the state. This over-reported the second
case as "active crashes" when in fact every in-tree caller maintains
the missing precondition via surrounding invariants (parser-loop
invariants, ELF parse-bounds checks).

The user (2026-05-19) asked for explicit separation: reachable bugs
should be REAL_BUG; latent panics on public API should be LATENT, with
a separate output stream so triage can pick severity tier.

Implementation:

- New ``CExOutcome.LATENT = "latent"`` enum value.
- ``_is_structural_panic(failing_property, trace)`` — substring match
  against Rust/C panic markers (slice OOB, integer overflow, divide-by-
  zero, capacity_overflow, …). Rejects custom postcondition violations
  (those are LLM-spec artifacts, not pure panic sites).
- ``_is_publicly_callable(func)`` — duck-typed against Rust
  ``is_pub`` and C ``not is_static``.
- ``ValidationResult.is_latent_bug`` property + ``is_latent_bug`` key
  in ``to_dict()``.
- ``_handle_spurious``: when refinement stabilises on a publicly-
  callable fn with a structural-panic CEx, upgrades SPURIOUS → LATENT
  instead of silently dropping the finding.
- Entry-function branch (no in-scope callers): same upgrade rule.
  Previously called REAL_BUG with ``confirmed_system_entry``; this
  over-reported. Non-public entry points still go to REAL_BUG
  (conservative — kernel handlers etc.).
- ``BugReporter.save_latent_report()`` + ``ArtifactStore.save_latent_
  report()`` write to ``{driver}/{fn}/latent_report.json`` next to the
  existing ``bug_report.json``.
- Pipeline routes LATENT outcomes in all 3 validate sites (Phase 3,
  Phase 3b recheck, Phase 3c CEGAR). Stashed on ``pipeline.latent_
  reports`` for CLI access.
- CLI ``verify`` summary prints separate ``Latent panics on pub API: N``
  block with property + function name.

Tests: 12 new unit tests covering the structural-panic substring
matrix, public-API detection across Rust/C signatures, and the
ValidationResult LATENT flag. Total: 560 passing (was 548).

This is a methodology change, not a heuristic tweak. Existing
real-bug-only consumers (eval scripts, baselines) are unchanged
because ``is_real_bug`` still means ``outcome == REAL_BUG``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nic markers

The LATENT marker list missed Rust stdlib panic helpers for failed
``unwrap()`` / ``expect()`` on Option/Result. The property name lands
as ``std::result::unwrap_failed.assertion.N`` or
``core::option::expect_failed.assertion.N`` — both are pure panic
sites reachable via cargo-fuzz / future-caller, so they should route
to LATENT (not REAL_BUG) when the entry-function branch fires.

Regression: CCC ``bytes_to_str`` and ``copy_literal_bytes_to_string``
in utils.rs v3 (2026-05-19) — both have unwrap() on
``str::from_utf8`` and both got REAL_BUG verdicts that should have
been LATENT under the new classification (they're public-API fns
with no in-tree caller producing the bad-UTF-8 state).

Tests: +1 unit test (13 total in test_latent_classification).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit (c3864af) routed all pub-API-reachable structural
panics to LATENT regardless of threat model. User feedback 2026-05-19:
under threat_model='security' (the default), the attacker IS a current
caller — adversarial inputs cross the public API boundary, so a panic
reachable from the pub API IS a real bug, not "future-caller risk."

The LATENT bucket only makes sense under non-security threat models
(safety, functional) where we explicitly care about crashes in current
in-tree call paths rather than worst-case input fuzzing.

Behavior:

- threat_model='security' (default): pub API + structural panic + no
  in-tree caller → REAL_BUG. Matches the cargo-fuzz standard the user
  originally set ("if a fuzzer can trigger it, it's a real bug"). The
  reasoning string explicitly mentions threat model so triage knows
  why the bug was promoted.
- threat_model='safety' / 'functional': same conditions → LATENT.
  The hardening bucket for in-tree-unreachable but pub-API-reachable
  panics.
- Non-public entries always go to REAL_BUG regardless of threat model.

Applied at both LATENT decision points: the entry-function branch
(no in-scope callers) and the _handle_spurious refinement-stabilised
branch.

Tests: +1 unit test verifying the Config.threat_model field is
honored and defaults to 'security'. 562 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e ones

User feedback 2026-05-19: BMC is more expressive than what the current
spec generation uses it for. The DSL and prompts are entirely defensive
— "what preconditions does this function need to be safe?" — which uses
a fraction of what Kani/CBMC can prove. BMC can also verify functional
correctness ("result == reference_computation"), algebraic identities
("align_up_64(v, a) % a == 0"), round-trip properties, structural
invariants. We were leaving that power on the table.

Phase 1: add an optional ``functional_spec`` field to the LLM response
JSON. When the LLM can express what a CORRECT implementation would
compute as a Rust/C boolean expression, it provides that spec; the
parser AND-merges it into the postcondition so downstream harness gen,
classification, refinement, and reporting consume it without any
further changes.

  precondition: "data.len() >= offset + 2"   ← unchanged (defensive)
  postcondition: "result <= u16::MAX"        ← unchanged (defensive)
  functional_spec: "result == ((data[offset+1] as u16) << 8)
                            | (data[offset] as u16)"  ← NEW
  =>
  merged post: "(result <= u16::MAX)
              && (result == ((data[offset+1] as u16) << 8)
                          | (data[offset] as u16))"

Behaviour:

- Old prompts without ``functional_spec`` parse identically (back-compat).
- Empty/trivial values ("", "true", "1", "n/a", "none") drop cleanly —
  no "(post) && (true)" noise.
- If the defensive post is itself "true", the functional spec replaces
  it entirely instead of wrapping.
- Both ENTRY_SPEC_PROMPT and INTERNAL_SPEC_PROMPT now ask for the
  optional field with examples (reference equivalence, algebraic laws,
  round-trip, structural).
- Prompts explicitly warn against fabricating functional specs — wrong
  spec produces false positives — and explicitly say leave empty when
  not expressible (e.g. parsers returning structured ASTs).

Expected effect on CCC sweep:
- Byte-helpers: spec asserts "result == manual LE decode"; Kani still
  finds OOB panics (same as before) AND verifies the bit operations.
- align_up_64: spec captures the alignment invariant; Kani may find
  arithmetic bugs that the defensive spec missed.
- Wrappers like bytes_to_str: spec compares to std::str::from_utf8;
  Kani can prove the wrapper is faithful.

This unlocks BMC for functional-correctness verification within the
existing architecture. No new dataclasses, no new harness paths.

Tests: 6 new unit tests covering missing/empty/trivial/real functional
spec parsing. 568 passing total (was 562).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… by default

Validation on linker_common/hash.rs + write.rs revealed the LLM was
declining to emit functional_spec for every function — pre/post stayed
"true" with the original prompt. Hand-injecting a functional spec on
align_up_64 ((result >= val) && (result - val < align) &&
(result % align == 0)) successfully verified functional correctness,
proving the wiring works. The LLM's reticence was the only gap.

Revised prompt:
- Reframe ``functional_spec`` from "optional" to "STRONGLY ENCOURAGED" —
  the LLM was reading the optional framing as "skip if unsure" by default.
- Add 5 concrete spec templates with real Rust syntax:
  byte readers (u16::from_le_bytes equivalence), alignment
  (algebraic identity), hashing (fold expression), wrappers
  (round-trip), bit ops (direct formula).
- Note tautological specs are CORRECT and should be emitted, not
  skipped as obvious — they catch interference / bugs in the
  function body even when the reference computation mirrors it.
- Keep the anti-fabrication rule but reorder: the message is now
  "emit by default; skip only when genuinely impossible," not
  "skip by default; emit only when confident."

Expected effect on next CCC sweep: more functional specs in spec.json,
more verified-clean runs with strengthened postconditions, and
potentially new bugs caught by Kani when the implementation diverges
from the asserted reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…prompts

The dual-spec path (enable_dual_spec=True default) uses CALLER_HEAVY_SPEC_PROMPT
and IMPL_HEAVY_SPEC_PROMPT, not the ENTRY_SPEC_PROMPT that previous commits
modified. So Phase 1 functional specs were never being generated in practice —
the pipeline routes through dual-spec which still asked for only pre/post.

Validated by direct LLM query: with the original ENTRY_SPEC_PROMPT the model
emits a perfect functional spec (e.g. "result == name.iter().fold(5381u32,
|h, &b| h.wrapping_mul(33).wrapping_add(b as u32))" for gnu_hash). In the
pipeline the dual-spec path stripped that field because the prompts didn't
declare it.

Both dual-spec prompts now request functional_spec with the same template-set
as ENTRY_SPEC_PROMPT (byte readers, hash folds, alignment identities,
wrappers). Tautological-reference framing preserved — the LLM should emit
the reference computation even when it mirrors the body.

568 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live validation on linker_common/write.rs (with API key + functional
spec landed) revealed a regression: the LLM emits rich functional
specs for state-mutating fns using ``old(buf.len())`` to reference
pre-call state. Kani's plain ``kani::assert`` can't see pre-state,
and the harness generator doesn't snapshot mutable inputs before the
call, so the harness fails to compile with ``E0425: cannot find
function 'old'`` and the function is SKIPPED — neither REAL_BUG
nor LATENT.

Concrete fallout: write.rs went from 3 bugs (align_up_64, pad_to,
write_elf64_phdr_at) to 1 bug + 2 skipped harnesses, because the
functional spec contained ``old()``.

Quick fix: detect ``old\\(`` in the functional spec and drop the
spec entirely. Loses some precision (the post stays at the defensive
"true") but harnesses compile and the function gets a real verdict.

Proper fix (deferred): snapshot mutable inputs in the harness
(``let _pre_buf = buf.clone();``) and substitute ``old(X)`` →
``_pre_X`` before passing to Kani. Then state-mutating fns get
their full functional specs verified.

The regex ``\\bold\\s*\\(`` is word-bounded so it doesn't match
identifiers like ``old_threshold`` or ``old_var``.

Tests: +3 unit tests (with-old / nested-old / without-old). 571
total (was 568).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1's previous quick-fix blanket-dropped functional specs containing
old() because Kani's plain kani::assert can't see pre-call state. That
suppressed useful specs on every state-mutating function (pad_to,
write_elf64_phdr, write_elf64_shdr, codegen passes — anywhere the LLM
needs to compare pre vs post buffer state).

This commit adds the proper substitution:

- New helper ``_extract_old_snapshots(postcondition)`` finds each
  ``old(EXPR)`` call with paren-balanced extraction (handles nesting),
  generates a snapshot binding ``let _pre_N = (EXPR);``, and rewrites
  the postcondition to use ``_pre_N`` in place of ``old(EXPR)``.

- Slice-indexing expressions (containing ``[``) get ``.to_vec()`` so
  the snapshot owns the data and survives the about-to-be-mutated
  borrow. Scalar exprs like ``buf.len()`` stay as-is (assume Copy).

- Nested ``old(...)`` is handled: the inner old() is stripped during
  snapshot generation because at snapshot point everything is already
  pre-state. So ``old(buf[..old(buf.len())])`` becomes
  ``let _pre_0 = (buf[..buf.len()]).to_vec();`` and the post substitutes
  the outer call with ``_pre_0``.

- Snapshots are emitted in the harness BETWEEN the precondition
  assumption and the function call, so they capture in-spec pre-state.

- Identifier lookbehind: ``cold(`` and ``_old(`` are not matched —
  only the ``old(`` form, optionally preceded by whitespace or
  non-identifier characters.

The earlier ``_parse_llm_spec_response`` drop logic is removed since
the backend now handles old() natively. Updated 3 tests in
test_functional_spec.py from "spec dropped" → "spec preserved";
added 5 new tests in test_kani.py covering the snapshot helper
(no-old passthrough, scalar snapshot, slice to_vec, nested old strip,
identifier guard).

Tests: 576 passing (was 571, +5 new).

Expected effect on next write.rs run: pad_to, write_elf64_phdr,
write_elf64_shdr no longer SKIPPED for E0425; their functional specs
verify against the implementation alongside the defensive checks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…positives

Phase 1 validation on CCC long_double.rs surfaced a systematic Phase 1
weakness: when the LLM-generated functional_spec contains plain
``+``/``-``/``*`` arithmetic on the function's inputs, Kani evaluates
the spec on adversarial nondet values (``usize::MAX``, ``i64::MIN``,
etc.) and the SPEC itself overflows. The pipeline classifies that as
``check_<fn>.assertion.N: attempt to add with overflow`` — a CEX in
the harness wrapper, not in the function body — and over-reports it
as a REAL_BUG.

Concrete fallout: long_double.rs reported 4 REAL_BUG (f64_decompose,
make_x87_infinity, shift_right_256_with_grs, shifted_limb) — all
spec-evaluation overflow, no actual bug in the function bodies.
inline_asm.rs reported 1 REAL_BUG (is_generic_gp_constraint) — also
spec fabrication (over-simple functional spec).

Two-pronged fix:

1. Prompt update (prompts.py) — add explicit "SPEC-EVALUATION OVERFLOW"
   warning to the entry-spec prompt. Tells the LLM to use
   ``wrapping_add``/``wrapping_mul`` in functional specs, or guard
   the precondition to exclude the overflowing inputs. With concrete
   wrong / right examples.

2. Classifier filter (cex_validator._witness_obvious_artifact) — when
   the failing property starts with ``check_`` (harness wrapper) AND
   the trace mentions ``attempt to {add,sub,mul,shift} with overflow``,
   downgrade to model-artifact (spec-evaluation overflow), not REAL_BUG.
   This catches false positives even when the LLM ignored the prompt
   warning.

The filter is precise: body-level overflow (``<fn>.assertion.N`` without
the check_ prefix) still routes through the REAL_BUG / LATENT
classification normally. Postcondition violations with non-overflow
traces also pass through as real bugs (those are genuine spec/impl
mismatches, like the align_up misalignment finding).

Tests: 7 new unit tests in test_spec_overflow_filter.py covering
add/sub/mul/shift overflow filtering, body-level overflow not filtered,
postcondition violations not filtered, check_*-named fns edge case.
583 passing (was 576, +7 new).

This makes Phase 1 noticeably more conservative in practice: it stays
silent on inputs the LLM should have guarded, instead of producing
false-positive bug reports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comprehensive writeup of the session: 25 real bugs found in CCC (24
defensive panic-class + 1 functional semantic), 4+ functional
correctness proofs (hash.rs full djb2/SysV equivalence, alignment
invariants on align_up_64, ELF byte-layout on write_elf64_phdr/shdr),
5 Phase 1 false positives (inline_asm + 4 long_double).

Includes: full bug table organized by file + function class, list of
13 bmc-agent improvements landed this session (with commit hashes),
what works well vs what doesn't, threat-model framing clarification,
and the methodology observation about caller-guards-match-inferred-pre.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… positives

Re-running long_double.rs with --enable-realism-check confirmed the
hypothesis: all 3 Phase 1 finds (f64_decompose, shift_right_256_with_grs,
make_x87_infinity) were verdicted UNREALISTIC with high confidence by
the realism check. CLI suppression dropped raw count 3 → final count 0.

Concrete realism reasoning examples:
- f64_decompose: "CBMC assertion violation with no counterexample
  witness values recorded"
- shift_right_256_with_grs: "verification artifact from CBMC's
  symbolic execution, not a real bug"
- make_x87_infinity: identified the x87 byte-layout convention
  mismatch correctly

Updated the session summary with:
- Validated mitigation (realism-check works, not just theoretical)
- Recommended invocation for future CCC runs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

bmc-agent results

1 real_bug verdict(s) on 1 changed file(s):

  • examples/bmc-action-selftest/safe_double.c::safe_double — safe_double.overflow.1

    'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary.

All verdicts (1)
[
  {
    "source": "examples/bmc-action-selftest/safe_double.c",
    "function": "safe_double",
    "outcome": "real_bug",
    "failing_property": "safe_double.overflow.1",
    "reasoning": "'safe_double' is an entry function (no callers in any file). The counterexample is directly reachable from the system boundary."
  }
]

@theyoucheng theyoucheng merged commit a3e6bf5 into main May 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant