Pass explicit AnalysisContext; treat VariableRegistry as a global service#1088
Pass explicit AnalysisContext; treat VariableRegistry as a global service#1088
Conversation
Wrap the thread-local analysis inputs (ProgramInfo, options, platform, VariableRegistry) in a single AnalysisContext aggregate, and plumb it explicitly through analyze, ebpf_domain_check, ebpf_domain_transform, and ebpf_domain_initialize_loop_counter. Backward-compatible overloads continue to synthesize the context from thread-locals. This is a migration step: EbpfDomain internals and a few helpers in ebpf_domain.cpp still reach the thread-locals, so analyze asserts that the caller-supplied VariableRegistry matches variable_registry.get(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
…er helpers into instance methods - ptr_max() in ebpf_domain.hpp now takes max_packet_size directly and no longer reads thread_local_options. The only remaining callers pass thread_local_options.max_packet_size (in ebpf_domain.cpp) or context.options.max_packet_size (in ebpf_transformer.cpp). The duplicate static ptr_max(context) in ebpf_transformer.cpp is gone. - max_packet_size() helper removed; its sole caller inlines thread_local_options.max_packet_size, making the dependency visible. - do_load_stack, do_store_stack, recompute_stack_numeric_size were static helpers taking an AnalysisContext and ArrayDomain. Since EbpfTransformer already holds both as members and is always constructed per-instruction, convert them to non-static methods that use this->stack and this->context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
…, initialize_packet These entry-point domain methods previously read thread_local_options, variable_registry, and thread_local_program_info directly. They now receive them explicitly via AnalysisContext. - setup_entry(bool init_r1, const AnalysisContext&) - initialize_packet(const AnalysisContext&) - from_constraints(const set<string>&, bool, const AnalysisContext&) (the string overload only; the typed-constraints overload doesn't touch thread-locals and is unchanged) Callers: - fwd_analyzer::analyze forwards its context. - result.cpp callers construct a context from thread_local_analysis_context. - test_analysis_context passes its explicit context. The "migration step" assertion in analyze() remains justified by the map accessors and calculate_constant_limits in ebpf_domain.cpp, which still read thread-locals. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
…loop_count_upper_bound The five map accessors (get_map_type, get_map_inner_map_fd, get_map_key_size, get_map_value_size, get_map_max_entries) now take a const ebpf_platform_t& rather than reaching into thread_local_program_info. get_loop_count_upper_bound takes a VariableRegistry& rather than reaching into variable_registry. Callers (EbpfChecker::ValidMapKeyValue, EbpfTransformer::operator(Call), InterleavedFwdFixpointIterator::max_loop_count) forward the right piece of their already-held AnalysisContext. The migration-phase assertion in analyze() still holds: only calculate_constant_limits (consumed via LazyAllocator) remains on thread-locals. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
…yable The registry is a name -> id cache. Given the same sequence of `make(name)` calls it returns the same Variable ids; clients never observe state beyond that mapping. Reflect this in the API: - `names` is now `mutable`, and every public factory / enumeration method (`reg`, `cell_var`, `kind_var`, `loop_counter`, `meta_offset`, `packet_size`, `stack_frame_var`, `get_type_variables`, `get_loop_counters`) is `const`. Callers can hold a `const VariableRegistry&` and still construct Variables. - A header comment explains that `mutable` is for memoization and that the existing thread_local storage is what lets us skip synchronization on `names`. Copying is deleted: duplicating the cache would let two registries assign different ids to the same name, and since `Variable` is a bare index with no back-pointer, a Variable from one registry would silently misinterpret against the other. Move is kept (it transfers identity rather than forking it). The one LazyAllocator<VariableRegistry>::operator=(const T&) caller in test_analysis_context switches to `.clear()` — same effect (next get() lazily re-initializes) without relying on copy. `EbpfDomain::get_loop_count_upper_bound` can now take `const VariableRegistry&` again, which is how it should have been typed from the start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
…et to the compat shim Now that VariableRegistry's public API is const-qualified, the context can hold it by const reference. Callers never need a mutable handle; the registry interns new ids through its internal `mutable` cache. A comment on the field points readers at var_registry.hpp for the why. Resetting the registry between runs was baked into ebpf_verifier_clear_before_analysis. That made sense when the registry was a thread-local singleton, but once the registry is passed in via the context, its lifetime is the caller's concern. The reset moves to the back-compat shims (analyze(prog) and analyze(prog, inv)), which keep the old "each call starts with a fresh thread-local registry" semantics. The context-taking overloads now just clear the per-analysis ArrayDomain cache and trust the caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
…e, drop the thread-local reads extract_instruction_deps used thread_local_options.total_stack_size() when normalizing derived-pointer stack offsets back into R10-relative ones. Accept the stack size as an explicit argument instead; the fwd_analyzer caller passes context.options.total_stack_size(), and the test callers pass a default-constructed ebpf_verifier_options_t's value. RelevantState used thread_local_options.total_stack_size() inside is_relevant_constraint to convert relative stack offsets to absolute "s[...]" names for pattern matching. Store the size on the struct at construction (from context.options.total_stack_size()). The join_relevance copy in printing.cpp propagates it from the first parent's state. Also: is_valid_after / check_observation_at_label already construct an AnalysisContext locally; switch their remaining setup_constraints reads from thread_local_options to context.options for consistency. After this change, result.cpp has no remaining thread-local reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
…ading thread_local_options AssertExtractor's Mem overload had the only thread_local_options read in the IR layer: a comparison against options.subprogram_stack_size when the access is R10-relative. Carry the options explicitly through AssertExtractor, get_assertions, and its three callers (Program:: from_sequence passes what it already has; EbpfChecker passes context.options; test_marshal passes a default-constructed options). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
…hread_local_options Unmarshaller already stores subprogram_stack_size as a per-run field populated from the options passed to unmarshal(). Add allow_division_by_zero next to it, for the same reason, so the div-by-zero warning stops relying on thread_local_options being set up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
add_cfg_nodes used thread_local_options.max_call_stack_frames for its depth check; instruction_seq_to_cfg and resolve_pseudo_load used thread_local_program_info for the platform and map descriptors. They now take what they need explicitly: max_call_stack_frames for the nesting check, ProgramInfo for the platform and map descriptors. Program::from_sequence passes its already-held info/options. Note: the callback-target post-processing in Program::from_sequence still mutates thread_local_program_info directly. That's tied to how EbpfChecker reads callback_target_labels/_with_exit through context.program_info — ProgramInfo holds these sets, and the compat shim in thread_local_analysis_context reads the same thread-local. Leaving that path alone for now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
…bottom Stack content has no inherent bottom of its own; bottom for the whole EbpfDomain is now represented uniformly by clearing all fields: state goes to TypeToNumDomain::bottom() and stack drops to nullopt. This is the AddBottom pattern already used for NumAbsDomain, applied externally to ArrayDomain. Invariant: state.is_bottom() <=> !stack.has_value(). Enforced by factories (bottom(), set_to_bottom()) and by binary ops (which short-circuit on bottom inputs and return bottom() outputs). The transformer's mid-instruction mutations may temporarily violate the invariant; ebpf_domain_transform normalizes at the end. Default construction no longer reads thread_local_options for stack sizing — bottom needs no size. Non-bottom factories (top, setup_entry, from_constraints string overload) take an AnalysisContext and pass total_stack_size through to a new sized BitsetDomain ctor. A size-only top(size_t) overload is provided for tests that don't have a full context handy. calculate_constant_limits still reads thread_local_options because it is consumed via a zero-arg LazyAllocator factory. Made explicit with a TODO; will be addressed in stage 3 when the constant-limits cache is restructured. A leaked-behavior bug is preserved (and flagged) in to_set(): a bottom domain serializes as top because several YAML tests encode that historical answer in their `post: []` expectations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
BitsetDomain::is_bottom() always returned false and set_to_bottom() just reset the bitset to a non-bottom state — the bottom concept was fictional. ArrayDomain forwarded both unchanged. Now that EbpfDomain wraps stack in std::optional and represents bottom externally (AddBottom-style), neither layer needs the fake. Removed: is_bottom() and set_to_bottom() on both classes; the dead `if (is_bottom())` branch in BitsetDomain::to_set(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
…ugh widen and calculate_constant_limits The size-less default ctors were the last paths in BitsetDomain / ArrayDomain that read thread_local_options. With every non-bottom construction site now going through the size-taking ctors, the defaults are unused and removed. EbpfDomain::calculate_constant_limits is no longer a zero-arg LazyAllocator factory. It takes the AnalysisContext explicitly, reading total_stack_size, packet_size, check_for_termination, and loop counters off the context instead of thread-locals. The cache is gone; widen recomputes on demand. The cost (~100 add_constraint calls per widen-with-constants) is small. EbpfDomain::widen now takes context too. The single internal caller (extrapolate in fwd_analyzer.cpp) forwards its context through. clear_thread_local_state on EbpfDomain is removed (the cache it cleared no longer exists); ebpf_verifier_clear_thread_local_state drops the corresponding call. The typed-overload of from_constraints picks up a defaulted total_stack_size parameter so test callers (test_subsumption, test_join) keep working without a context. After this change ebpf_domain.cpp has zero thread-local reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
reg_pack(i) is just a bulk version of reg(kind, i): every field is registry->reg(SomeKind, i). Move it next to its peers (reg, type_reg, cell_var, kind_var, …) on VariableRegistry, and move RegPack itself into var_registry.hpp where its fields are produced. EbpfTransformer and EbpfChecker get same-named member methods that delegate to context.variables.reg_pack — so the existing ~37 unqualified `reg_pack(...)` calls inside those classes resolve to the member with no source change. Other callers that hold a context (setup_entry, calculate_constant_limits, do_load_ctx) call context.variables.reg_pack directly. do_load_packet_or_shared takes a VariableRegistry parameter. The free `reg_pack(int)` and `reg_pack(Reg)` shims in type_to_num.hpp remain (using the thread-local registry) for the call sites that don't yet have a context: TypeToNumDomain methods, EbpfDomain const methods (get_r0, get_stack_offset, get_map_fd_range), and tests. Those will migrate in subsequent commits as their callers are parameterized. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
…ervice VariableRegistry is a name-interning service: name -> stable id, with the only correctness requirement that the same name maps to the same id within an analysis. Two analyses sharing one registry is fine (monotonic growth across runs, but no observable effect). It belongs in the same conceptual bucket as malloc — infrastructure, not per-analysis state. So: - AnalysisContext loses its `variables` field. The struct is now the three things that genuinely vary per analysis: program_info, options, platform. - The migration-phase asserts in `analyze(..., context)` go away — they have nothing to assert. - The compat shims `analyze(prog)` / `analyze(prog, inv)` no longer reset the registry. Names accumulate across runs; that's fine. - Domain code that briefly went through `context.variables.X` reverts to `variable_registry->X`. Convenience reg_pack methods on EbpfTransformer / EbpfChecker are removed (the free `reg_pack` shim in type_to_num.hpp is enough). - `EbpfDomain::get_loop_count_upper_bound` drops its registry parameter. - `do_load_packet_or_shared` drops the registry parameter we'd added. Tests stay green (1297 / 236 expected-fail / 1 skipped). The bulk of the previous context-plumbing work — `AnalysisContext` itself, `setup_entry`/`from_constraints`/`top` taking context, the options-and-platform threading — keeps its value: those things ARE per-analysis. Only the registry plumbing was solving the wrong problem. The Rust port (../prevail-rust) takes the opposite tack and threads &mut VariableRegistry everywhere because the borrow checker mandates it. C++ doesn't have that constraint, so the global is fine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
The registry doesn't need lazy construction (it's used by every analysis, so first-touch eager init is fine) and explicitly never clears (names accumulate monotonically, like a string pool). The LazyAllocator wrapper exposed clear/set/operator= that we'd never call and would invite misuse, so collapse the wrapping: a plain `thread_local VariableRegistry variable_registry;` direct. Header-side comment explains both choices in a few lines. The ~80 callers switch from `variable_registry->X` to `variable_registry.X`. Mechanical sed. test_analysis_context.cpp was demonstrating that an explicit AnalysisContext could carry a different VariableRegistry from the thread-local. With the registry now correctly modeled as a global service (not part of AnalysisContext), there's nothing left to demonstrate; delete the file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
`from_constraints` was the visible symptom: mutating state.types via restrict_to / assume_eq_types could push the domain to bottom while leaving stack materialized, breaking the documented "state.is_bottom() <=> !stack" invariant. to_set() then printed the stack content instead of "_|_". Root cause: those direct accesses bypassed any EbpfDomain wrapper, and the existing add_value_constraint wrapper didn't normalize either. Fix at the right layer: every EbpfDomain-level state mutator (add_value_constraint, assume_eq_types, restrict_type) calls a single normalize_after_state_mutation helper that drops stack when state goes bottom. Both from_constraints overloads now route through these wrappers instead of poking state.types directly. The invariant is re-established at the point the divergence could arise, not defensively at the call site. The numeric_ranges loop in from_constraints(string) gets one explicit is_bottom() check before dereferencing inv.stack — the loop only makes sense on a non-bottom domain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
The two loop-counter call sites — calculate_constant_limits and get_loop_count_upper_bound — were asking variable_registry "what loop counters exist?" That question is analysis-specific (per-program loop heads), not registry-state. With the registry as a true global, the registry accumulates counters across analyses on the same thread, so the answer is wrong: get_loop_count_upper_bound went to +∞ as soon as any prior analysis introduced loop counters not in the current state's numeric domain (which evaluate to top), making max_loop_count silently saturate at INT_MAX. Source the set from the right place: InterleavedFwdFixpointIterator already walks _wto.for_each_loop_head to initialize counters. Compute the corresponding Variables once in the analyzer's constructor and thread the std::span<const Variable> through extrapolate -> widen -> calculate_constant_limits, and through max_loop_count -> get_loop_count_upper_bound. The registry never gets asked. Tests stay green. The bug wasn't caught by tests because no test pins a specific max_loop_count value — but multi-program runs in long-running processes were silently reporting unbounded loops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
The four remaining variable_registry.get_type_variables() callers were asking the global interner "what type variables exist?" — that's analysis-state, not registry-state. With the registry as a permanent global, the answer accumulates across runs on the same thread. TypeDomain knows its own tracked variables (the var_ids map). Expose two queries: - variables(): the full set this domain tracks. Used by the join / subsumption helpers in TypeToNumDomain (operator<= and operator|=) that genuinely need to iterate per-variable structure. - variables_with_type(type): tracked variables whose TypeSet may contain `type`. Used where the question is "which variables might have type T?" (havoc_all_locations_having_type, do_store_stack's stack_numeric_size update). For collect_type_dependent_constraints we iterate the union of the two operands' tracked variables, since a variable tracked on only one side can still have differing type info. variable_registry.get_type_variables() is removed — no callers left. The std::string ends_with helper that supported it goes too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
📝 WalkthroughWalkthroughAdds Changes
Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
src/crab/type_to_num.cpp (1)
114-128: 🧹 Nitpick | 🔵 TrivialIterating
types.variables()instead of every registered type var is a precision improvement — worth noting soundness explicitly.The previous implementation walked all registry-wide type variables; the new one only walks variables tracked in
this. For an untracked variable, the abstract type is top (any type is possible), so none of its kinds are "nonexistent" and no havoc oftmp.valuesis warranted — meaning this change makesoperator<=strictly stricter (less likely to returntruespuriously), which is sound.As per coding guidelines: "Any change that affects analysis results should spell out the argument for soundness: what inputs are assumed, what invariants are maintained, and how the change preserves them." Consider adding a one-liner comment on line 116 capturing that reasoning so auditors don't have to re-derive it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crab/type_to_num.cpp` around lines 114 - 128, Add a one-line comment inside TypeToNumDomain::get_nonexistent_kind_variables (near the loop over types.variables()) explaining that we intentionally iterate only the variables tracked in this->types (not all registry-wide type vars) because untracked variables are treated as top (all types possible) so no kinds are "nonexistent" for them; mention that this preserves soundness for operator<= and prevents spurious inclusion in tmp.values, and reference type_to_kinds and variable_registry.kind_var for context.src/result.hpp (1)
66-74:⚠️ Potential issue | 🟡 Minor
total_stack_size = 0default is a silent-wrong-answer hazard.
RelevantStateis passed around by value and constructed in several places (e.g.,join_relevanceonprinting.cpp:815, where only the primary label's field is copied into the merged state). A zero default means any code path that forgets to seed it will quietly translate relative offsets against a stack size of 0 rather than failing loudly.Either drop the default and force callers to supply it, or make translation assert
total_stack_size > 0before using it. A short comment here stating "must matchoptions.total_stack_size()of the owning analysis" would also help.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/result.hpp` around lines 66 - 74, RelevantState's total_stack_size must not default to 0 because callers sometimes copy/construct it without seeding and that silently produces wrong translations; remove the "= 0" default, add an explicit constructor (e.g., RelevantState(int total_stack_size)) that initializes total_stack_size and keeps registers/stack_offsets default-initialized, update all call sites that create RelevantState (including the merge in join_relevance) to pass the correct options.total_stack_size(), and add a short comment on the field that it must match options.total_stack_size() of the owning analysis.src/crab/var_registry.cpp (1)
232-240: 🧹 Nitpick | 🔵 Trivial
get_loop_counters()does a redundant O(n) lookup per counter and matches too loose a prefix.Each matching
nameis re-interned viaintern(name), which scans the entirenamesvector again — so this is O(n²) in the number of interned names, even though we already know the index while iterating. Also,loop_counter(label)produces"pc[" + label + "]", but the filter uses bare"pc", which would falsely match any future name starting withpc(e.g.pc_whatever).♻️ Proposed refactor
std::vector<Variable> VariableRegistry::get_loop_counters() const { std::vector<Variable> res; - for (const std::string& name : names) { - if (name.starts_with("pc")) { - res.push_back(intern(name)); - } + for (size_t i = 0; i < names.size(); ++i) { + if (names[i].starts_with("pc[")) { + res.push_back(Variable(i)); + } } return res; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crab/var_registry.cpp` around lines 232 - 240, The get_loop_counters() implementation currently re-interns each matching name causing O(n²) work and uses too-broad prefix "pc"; change VariableRegistry::get_loop_counters to iterate with an index over names, match the exact "pc[" prefix (since loop_counter(label) formats as "pc[" + label + "]"), and construct Variables directly from the known index/slot rather than calling intern(name) again so you avoid the extra scan.src/crab/ebpf_domain.cpp (2)
289-308:⚠️ Potential issue | 🟠 MajorWrong field compared in
get_map_inner_map_fd—map->typevs. accumulatedinner_map_fd.After seeding
inner_map_fd = map->inner_map_fd, subsequent iterations comparemap->type != *inner_map_fdinstead ofmap->inner_map_fd != *inner_map_fd. That means the consistency check across the fd range is effectively checking the wrong descriptor field: two maps whoseinner_map_fdvalues differ will pass this guard as long as theirtypes happen to match the first seeninner_map_fd(and vice versa). The function can then return a spurious "agreed" inner_map_fd, which is a soundness hazard for any caller keying off it.Likely pre-existing, but worth fixing in this PR since the signature is being reworked here anyway.
🐛 Proposed fix
if (!inner_map_fd.has_value()) { inner_map_fd = map->inner_map_fd; - } else if (map->type != *inner_map_fd) { + } else if (map->inner_map_fd != *inner_map_fd) { return {}; }As per coding guidelines: "Any change that affects analysis results should spell out the argument for soundness: what inputs are assumed, what invariants are maintained, and how the change preserves them."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crab/ebpf_domain.cpp` around lines 289 - 308, The function get_map_inner_map_fd seeds inner_map_fd from map->inner_map_fd but then erroneously compares map->type to *inner_map_fd; change the consistency check so each subsequent iteration compares map->inner_map_fd to *inner_map_fd (not map->type) and return {} on mismatch. Locate the loop in get_map_inner_map_fd, keep the initial seeding of inner_map_fd from map->inner_map_fd, and replace the map->type != *inner_map_fd condition with map->inner_map_fd != *inner_map_fd to ensure the returned inner_map_fd is consistent across the full fd range obtained via get_map_fd_range and platform.get_map_descriptor.
267-286:⚠️ Potential issue | 🟡 MinorDead null check on a reference-returning accessor.
platform.get_map_descriptor(map_fd)returns a reference, so&platform.get_map_descriptor(...)is never null and theif (map == nullptr) return {};guard is unreachable. Same pattern at lines 297, 319, 337, 355. Pre-existing, but now thatplatformis passed explicitly the contract is clearer — either drop the dead branch, or switch to a nullable API if invalid fds should produce{}instead of relying on whateverget_map_descriptordoes internally for unknown fds.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crab/ebpf_domain.cpp` around lines 267 - 286, platform.get_map_descriptor(...) returns a reference so taking its address and checking for nullptr is dead code; change the local variable to a reference (EbpfMapDescriptor& map = platform.get_map_descriptor(map_fd)) and remove the unreachable if (map == nullptr) return {} check in EbpfDomain::get_map_type, and apply the same fix to the other similar sites where you use &platform.get_map_descriptor(...) (the checks at the other occurrences should be removed or replaced by reference usage). If the intended semantics are to handle invalid fds, instead change/get_map_descriptor to return a nullable type (pointer or std::optional) and update callers accordingly — pick one approach and make it consistent across all occurrences.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/analysis_context.hpp`:
- Around line 20-24: AnalysisContext contains non-owning const references
(program_info, options, platform) and can silently dangle if the referents are
shorter-lived; add a one-line doc comment immediately above struct
AnalysisContext clarifying the lifetime contract (e.g., that an AnalysisContext
is non-owning and must not outlive the ProgramInfo/options/platform used to
construct it and must not be cached beyond the creating analysis call), and
optionally add a brief TODO or assertion location comment where AnalysisContext
instances are created to remind callers to enforce that precondition; reference
the struct AnalysisContext and its members program_info, options, platform in
the comment.
- Around line 5-10: Remove the unused include of "crab/var_registry.hpp" from
the top of analysis_context.hpp: delete the `#include` "crab/var_registry.hpp"
line so AnalysisContext and its doc comment about variable_registry remain but
the registry header is not pulled in transitively; verify no symbols from
var_registry are referenced in the file (if any are, replace with a forward
declaration or adjust references) and run a rebuild to ensure no missing
dependencies.
In `@src/crab/ebpf_checker.cpp`:
- Around line 221-225: The checks in ebpf_checker.cpp use
context.program_info.callback_target_labels and callback_targets_with_exit but
Program::from_sequence() currently builds those sets on
thread_local_program_info, so the AnalysisContext's ProgramInfo stays empty and
valid callbacks are rejected; move the derived metadata population into the
ProgramInfo instance that AnalysisContext carries (or into Program) so that
context.program_info.callback_target_labels and
context.program_info.callback_targets_with_exit are populated before
ebpf_checker.cpp's checks run, ensuring the checks see the same sets regardless
of explicit-context vs thread-local paths (update Program::from_sequence() or
the code that constructs ProgramInfo to fill these fields, and remove reliance
on thread_local_program_info).
In `@src/crab/ebpf_domain.cpp`:
- Around line 22-31: EbpfDomain::to_set currently returns StringInvariant::top()
when stack is null to satisfy existing YAML tests, but it should return
StringInvariant::bottom(); change the return in EbpfDomain::to_set from
StringInvariant::top() to StringInvariant::bottom() and update any dependent
logic as needed, and create a tracking artifact (issue or PR stub) referencing
this change and the required YAML test updates so the test expectations (post:
[]) are amended in the same change; use the symbols EbpfDomain::to_set,
StringInvariant::bottom(), StringInvariant::top() in the tracking description.
In `@src/crab/ebpf_domain.hpp`:
- Around line 128-133: The comment claims all mutators maintain the
`state.is_bottom() <=> !stack` invariant via normalize_after_state_mutation, but
havoc(Variable) (implemented as state.values.havoc(var)) does not call
normalize_after_state_mutation; either defensively invoke
normalize_after_state_mutation() at the end of havoc(Variable) in
ebpf_domain.cpp to preserve the documented invariant (call
normalize_after_state_mutation() after state.values.havoc(var)), or update the
comment in ebpf_domain.hpp to exclude havoc from that group and add a short
rationale referencing havoc/ state.values.havoc explaining why it cannot drive
the numeric domain to bottom so the invariant still holds.
In `@src/crab/ebpf_transformer.cpp`:
- Around line 32-36: The constructor for EbpfTransformer currently
unconditionally dereferences _dom.stack (stack(*_dom.stack)) which can occur
when entry_inv is bottom; fix by guarding creation: before constructing
EbpfTransformer in ebpf_domain_transform (and the other site referenced around
1493-1495), check that _dom.stack has a value (e.g., if (!_dom.stack)
return/skip/handle bottom) or ensure entry_inv is satisfiable; alternatively
change EbpfTransformer to not dereference in its initializer (store a
pointer/optional reference to _dom.stack and only .value() after entry_inv is
confirmed non-bottom inside EbpfTransformer::run()). Ensure you update all
construction sites (including the ones noted) accordingly.
In `@src/crab/type_to_num.cpp`:
- Around line 209-218: The function
TypeToNumDomain::havoc_all_locations_having_type currently assumes
TypeDomain::variables_with_type returns empty when bottom; make that
precondition explicit by checking the domain bottom state at the start of
havoc_all_locations_having_type — e.g., add an assert(!is_bottom()) (or an
explicit if (is_bottom()) return;) and a short comment referencing the reliance
on variables_with_type's contract so the function fails fast if bottom-handling
changes in the future. Ensure the check uses TypeToNumDomain::is_bottom() to
locate the contract enforcement.
In `@src/result.cpp`:
- Around line 114-116: The AnalysisResult code currently rebuilds
AnalysisContext from thread_local_analysis_context() (see the calls near
EbpfDomain::from_constraints in result.cpp and the similar sites at 130-134 and
386-387), which causes results to depend on ambient TLS; change the
implementation to stop reading TLS: either fetch the AnalysisContext stored on
the AnalysisResult instance (add or use an existing member) or require an
explicit AnalysisContext parameter to the function that creates the EbpfDomain,
and pass that context into EbpfDomain::from_constraints instead of calling
thread_local_analysis_context(); update all similar call sites (the three
locations noted) to use the stored/explicit context.
In `@src/test/test_elf_loader.cpp`:
- Line 562: The SECTION call is on a single line while other SECTION blocks use
multi-line formatting; change SECTION("0x80000000 exceeds int32 — bails out
without mutation") { check_bailout_preserves_program(0x80000000ULL); } to the
same multi-line style used elsewhere (open SECTION on its own line, put the
check_bailout_preserves_program(0x80000000ULL); call on the next line, then
close the SECTION block on its own line) so the SECTION and the call to
check_bailout_preserves_program remain consistently formatted with surrounding
tests.
---
Outside diff comments:
In `@src/crab/ebpf_domain.cpp`:
- Around line 289-308: The function get_map_inner_map_fd seeds inner_map_fd from
map->inner_map_fd but then erroneously compares map->type to *inner_map_fd;
change the consistency check so each subsequent iteration compares
map->inner_map_fd to *inner_map_fd (not map->type) and return {} on mismatch.
Locate the loop in get_map_inner_map_fd, keep the initial seeding of
inner_map_fd from map->inner_map_fd, and replace the map->type != *inner_map_fd
condition with map->inner_map_fd != *inner_map_fd to ensure the returned
inner_map_fd is consistent across the full fd range obtained via
get_map_fd_range and platform.get_map_descriptor.
- Around line 267-286: platform.get_map_descriptor(...) returns a reference so
taking its address and checking for nullptr is dead code; change the local
variable to a reference (EbpfMapDescriptor& map =
platform.get_map_descriptor(map_fd)) and remove the unreachable if (map ==
nullptr) return {} check in EbpfDomain::get_map_type, and apply the same fix to
the other similar sites where you use &platform.get_map_descriptor(...) (the
checks at the other occurrences should be removed or replaced by reference
usage). If the intended semantics are to handle invalid fds, instead
change/get_map_descriptor to return a nullable type (pointer or std::optional)
and update callers accordingly — pick one approach and make it consistent across
all occurrences.
In `@src/crab/type_to_num.cpp`:
- Around line 114-128: Add a one-line comment inside
TypeToNumDomain::get_nonexistent_kind_variables (near the loop over
types.variables()) explaining that we intentionally iterate only the variables
tracked in this->types (not all registry-wide type vars) because untracked
variables are treated as top (all types possible) so no kinds are "nonexistent"
for them; mention that this preserves soundness for operator<= and prevents
spurious inclusion in tmp.values, and reference type_to_kinds and
variable_registry.kind_var for context.
In `@src/crab/var_registry.cpp`:
- Around line 232-240: The get_loop_counters() implementation currently
re-interns each matching name causing O(n²) work and uses too-broad prefix "pc";
change VariableRegistry::get_loop_counters to iterate with an index over names,
match the exact "pc[" prefix (since loop_counter(label) formats as "pc[" + label
+ "]"), and construct Variables directly from the known index/slot rather than
calling intern(name) again so you avoid the extra scan.
In `@src/result.hpp`:
- Around line 66-74: RelevantState's total_stack_size must not default to 0
because callers sometimes copy/construct it without seeding and that silently
produces wrong translations; remove the "= 0" default, add an explicit
constructor (e.g., RelevantState(int total_stack_size)) that initializes
total_stack_size and keeps registers/stack_offsets default-initialized, update
all call sites that create RelevantState (including the merge in join_relevance)
to pass the correct options.total_stack_size(), and add a short comment on the
field that it must match options.total_stack_size() of the owning analysis.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3c0686dc-13ed-4737-ab14-d4fdffae0d13
📒 Files selected for processing (30)
src/analysis_context.hppsrc/crab/array_domain.cppsrc/crab/array_domain.hppsrc/crab/bitset_domain.cppsrc/crab/bitset_domain.hppsrc/crab/ebpf_checker.cppsrc/crab/ebpf_domain.cppsrc/crab/ebpf_domain.hppsrc/crab/ebpf_transformer.cppsrc/crab/finite_domain.cppsrc/crab/type_domain.cppsrc/crab/type_domain.hppsrc/crab/type_to_num.cppsrc/crab/type_to_num.hppsrc/crab/var_registry.cppsrc/crab/var_registry.hppsrc/crab/zone_domain.cppsrc/fwd_analyzer.cppsrc/ir/assertions.cppsrc/ir/cfg_builder.cppsrc/ir/parse.cppsrc/ir/program.hppsrc/ir/unmarshal.cppsrc/printing.cppsrc/result.cppsrc/result.hppsrc/test/test_elf_loader.cppsrc/test/test_failure_slice.cppsrc/test/test_marshal.cppsrc/verifier.hpp
💤 Files with no reviewable changes (1)
- src/crab/bitset_domain.cpp
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6141700718
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Guard ebpf_domain_initialize_loop_counter against bottom domains; the EbpfTransformer ctor dereferences dom.stack, which is nullopt when bottom. - Assert !is_bottom() in TypeToNumDomain::havoc_all_locations_having_type so the reliance on variables_with_type returning empty for bottom fails loudly if that contract ever changes. - Drop the unused crab/var_registry.hpp include from analysis_context.hpp. - Spell out AnalysisContext's non-owning lifetime contract. - Clarify in ebpf_domain.hpp that havoc does not need normalization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/crab/ebpf_domain.hpp`:
- Around line 100-102: Remove the implicit default for total_stack_size on
EbpfDomain::from_constraints so callers must supply it explicitly: update the
declaration and corresponding definition of EbpfDomain::from_constraints to
remove the default argument (the parameter remains size_t total_stack_size), and
update all call sites (including tests) to pass an explicit value (use
ebpf_verifier_options_t{}.total_stack_size() in tests where the default behavior
is intended). Ensure any overloads or wrappers around from_constraints are
adjusted likewise to avoid implicit defaults.
In `@src/crab/type_to_num.cpp`:
- Around line 135-144: The current code builds type_vars as a std::set which
gives deterministic ordering but incurs O(log n) inserts; change type_vars to a
std::unordered_set<Variable> when collecting from types.variables() and
right.types.variables() to get amortized O(1) inserts (then iterate that
unordered_set when computing per-variable work for join_selective); if
deterministic iteration is actually required downstream (e.g. some future
non-commutative use of values.set(...)), sort the collected elements into a
vector after deduplication before iterating; ensure Variable is hashable
(add/enable a std::hash<Variable> specialization) or fall back to std::set if
hashing isn’t available.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9bd0b51c-6d06-4fab-a581-a68f1d46cdb1
📒 Files selected for processing (4)
src/analysis_context.hppsrc/crab/ebpf_domain.hppsrc/crab/ebpf_transformer.cppsrc/crab/type_to_num.cpp
Resolve conflict in compute_failure_slices: keep main's compute_slice_from_label extraction while preserving the PR's explicit AnalysisContext for ebpf_domain_check and RelevantState init. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/result.cpp (1)
430-610:⚠️ Potential issue | 🔴 CriticalThread
seed_relevance.total_stack_sizethrough everyRelevantStateinsertion into maps—stack constraint filtering is silently broken.
seed_relevance.total_stack_sizearrives via parameter but never propagates intovisited[current_label],slice_labels[current_label], orjoin_expansionentries. Every insertion uses default-constructedRelevantStatewithtotal_stack_size = 0.
is_relevant_constraintcomputes absolute stack offsets astotal_stack_size + rel_offset(line ~69 in result.cpp). With zeroedtotal_stack_size, all stack range checks fail silently—constraints involvings[...]get compared against wrong absolute ranges.Apply the fix at these locations:
- Line 432:
existing.total_stack_size = seed_relevance.total_stack_size;aftervisited[current_label]- Line 546–549:
existing.total_stack_size = seed_relevance.total_stack_size;afterslice_labels[current_label]- Line 555–557:
existing.total_stack_size = seed_relevance.total_stack_size;afterslice_labels[current_label]- Lines 594–603: Set
total_stack_sizewhen populatingjoin_expansion🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/result.cpp` around lines 430 - 610, The RelevantState.total_stack_size from seed_relevance is not being propagated when inserting into maps, breaking stack-offset checks; update every place a RelevantState is default-constructed/merged so its total_stack_size is set to seed_relevance.total_stack_size: after creating/merging visited[current_label] set existing.total_stack_size = seed_relevance.total_stack_size; after merging into slice_labels[current_label] (both in the instruction_contributes branch and the no-deps branch) set that existing.total_stack_size = seed_relevance.total_stack_size; and when inserting entries into join_expansion (both for the join label and for predecessor entries) initialize their .total_stack_size from seed_relevance.total_stack_size (or from the visited entry used when copying relevance). Ensure each inserted/merged RelevantState preserves or copies total_stack_size rather than leaving it zero.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/result.hpp`:
- Around line 66-70: The field RelevantState::total_stack_size currently
defaults to 0 which silently mis-translates relative stack offsets; update
RelevantState so total_stack_size has no default (or use an explicit sentinel
like -1) and ensure callers like compute_slice_from_label initialize it from the
seed, then add an assertion at the top of RelevantState::is_relevant_constraint
(assert(total_stack_size >= 0)) before computing abs = total_stack_size +
rel_offset to fail loudly on uninitialized state.
---
Outside diff comments:
In `@src/result.cpp`:
- Around line 430-610: The RelevantState.total_stack_size from seed_relevance is
not being propagated when inserting into maps, breaking stack-offset checks;
update every place a RelevantState is default-constructed/merged so its
total_stack_size is set to seed_relevance.total_stack_size: after
creating/merging visited[current_label] set existing.total_stack_size =
seed_relevance.total_stack_size; after merging into slice_labels[current_label]
(both in the instruction_contributes branch and the no-deps branch) set that
existing.total_stack_size = seed_relevance.total_stack_size; and when inserting
entries into join_expansion (both for the join label and for predecessor
entries) initialize their .total_stack_size from seed_relevance.total_stack_size
(or from the visited entry used when copying relevance). Ensure each
inserted/merged RelevantState preserves or copies total_stack_size rather than
leaving it zero.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 18d38e41-a514-4886-ad2f-17bef043abdf
📒 Files selected for processing (2)
src/result.cppsrc/result.hpp
Summary
Reshapes the verifier's analysis API to thread per-analysis state explicitly via a new
AnalysisContextstruct (program_info,options,platform), and corrects the modeling ofVariableRegistryas a global name-interning service rather than per-analysis state.The branch grew through several refactor steps that, taken together, replace the implicit thread-local environment with explicit parameters wherever the data is genuinely per-analysis, and clarify the responsibilities of the registry where it isn't.
Why
Before this branch, the analyzer reached for
thread_local_options,thread_local_program_info, andvariable_registryfrom arbitrary depths in the call graph. That made:EbpfDomain,EbpfChecker,EbpfTransformeropaque;Two semantic bugs surfaced from that confusion:
EbpfDomain::get_loop_count_upper_bound()queriedvariable_registry.get_loop_counters(), which after the registry-as-global reframing accumulated counters across analyses. Stale counters evaluated to top in current state, sendingmax_loop_countsilently toINT_MAX.EbpfDomain::from_constraintscould leavestatebottom whilestackstayed materialized —to_set()then printed stack content instead of_|_.Both are fixed.
What changed
AnalysisContextcarries the three things that vary per analysis.VariableRegistrydeliberately isn't a member: it's a global interning service (no clearing, no LazyAllocator wrapper, never threaded as state).top(context)/top(size_t),setup_entry(bool, context), bothfrom_constraintsoverloads,calculate_constant_limits(context, loop_counters),widen(other, to_constants, context, loop_counters). SizedBitsetDomain(size_t)/ArrayDomain(size_t)ctors; default ctors removed.EbpfDomain::stackbecomesstd::optional<ArrayDomain>with the invariantstate.is_bottom() ⇔ !stack.has_value(). Bottom default-construction needs no thread-local read.BitsetDomainandArrayDomainlose their fakeis_bottom/set_to_bottom(the bottom concept now lives only atEbpfDomain).EbpfDomain(add_value_constraint,assume_eq_types,restrict_type) maintain the invariant via a singlenormalize_after_state_mutationhelper.Wto(per-program loop heads), not from the registry.TypeDomain::variables()/variables_with_type(type)(per-state tracked vars), not fromvariable_registry.get_type_variables()(which is removed).VariableRegistryis now: non-copyable, all methodsconstover amutableinterning cache,makerenamed tointern.RegPackandreg_packmoved here fromtype_to_num.hpp.extract_instruction_deps,RelevantState,get_assertions,Unmarshaller,cfg_builder,result.cpp) take their inputs explicitly.Known leftover
EbpfDomain::to_set()still serializes a bottom domain as top, with aBUG:comment, to satisfy YAML test expectations encoded inpost: []. Real fix is to return bottom and update the YAML expectations — left as a follow-up.Test plan
🤖 Generated with Claude Code