Add R2P support with benchmarks and reorganize into R2R/R2P by avitalpresearch-droid · Pull Request #3 · lucasmt/separation-grk

avitalpresearch-droid · 2026-04-05T22:43:39Z

Latest: Refactor R2P alive region, fix strategy reach phase, add NoC benchmarks

1. Refactor: Extract ComputeR2PAliveRegion shared method
Steps 1–5 of the R2P algorithm (build joint demanded predicate, compute state-dependent alive region) were duplicated in ComputeCoveredRegion and ComputeCycleStrategy. Extracted into a single ComputeR2PAliveRegion method that both call.

2. Fix: AND → OR in ComputeR2PPathStrategy realizable region
The two-phase R2P path strategy (reach alive, then maintain) had its realizable region computed as reach & maintain. Since maintain is only realizable at alive states, the AND collapsed to just alive — making the reach phase dead code. The cycle strategy would never activate it from states outside alive, falling back to idle instead of driving toward alive. Changed to OR so both phases contribute.

Verified via --play: with AND, the system stays at (x=0, y=0) forever (idle, FG(y) violated); with OR, it reaches (x=0, y=1) by step 2 and stabilizes (FG(y) satisfied).

3. Add NoC enforcement benchmarks
Three R2P examples derived from industry Network-on-Chip enforcement specifications (QoS priority bounding, bus lock violation, BIST reconfiguration, fault containment). noc_formulas_4_13 and noc_enforcement_no46 expose the per-implication alive region bug — old code said REALIZABLE, both Strix and the fix say UNREALIZABLE.

4. Strategy verification script (benchmarks/verify_strategies.sh)
For each realizable spec, plays the game with worst-case inputs and checks that FG guarantees stabilize and GF guarantees recur. Catches the reach-phase bug: strategy_bug.sgrk FAIL with AND, PASS with OR.

Previous: Fix R2P joint state-dependent alive region

The committed R2P implementation computed alive regions independently per
implication. This is incorrect when multiple R2P implications share output
variables — their alive regions interact through the safety constraint, and
per-implication regions can be non-empty while the joint region is empty.

The fix:

Combines all R2P implications into a single demanded predicate:
demanded(in, out) = ∧_i (¬can_cycle(A_i) ∨ g_i(out))
Makes the alive region input-dependent via OutputUnprimedToPrimed
(primes only output vars, leaves input as identity)
Uses output-only quantification throughout the fixpoint
Produces a single merged path strategy for all R2P implications

Gives smaller BDDs (n+m free variables instead of 2n+m) and requires
no post-processing to unprime input variables.

Summary

Reorganize benchmarks into R2R, R2P, P2R directories
Add R2P support: parser, CycleCover, mixed-type rejection
Fix R2P alive region: joint state-dependent computation
Fix R2P strategy: AND → OR in path strategy realizable region
Refactor: shared ComputeR2PAliveRegion method
Add P2R support: parser, CycleCover
Benchmarks: stabilize_lock, forced_oscillation, alive_region_pruning, cleaning_robots, power_grid, noc_enforcement, strategy_bug (R2P); cleaning_robots, responsive_toggle, forced_stability, commitment_conflict (P2R)
Strix comparison: full infrastructure with caching, LaTeX/SVG tables
Strategy verification: verify_strategies.sh for play-based regression testing

Test plan

R2P: 40 passed, 0 failed (including noc_enforcement unrealizable + strategy_bug realizable)
R2R/P2R: all pass
Mixed type specs correctly rejected
R2P Strix comparison: sgrk matches Strix on all examples
noc_formulas_4_13: old code REALIZABLE (wrong), fix UNREALIZABLE (matches Strix)
strategy_bug: old code strategy stuck at idle, fix reaches alive by step 2
verify_strategies.sh: strategy_bug PASS after fix, FAIL with old AND

🤖 Generated with Claude Code

- Restructure benchmarks/ into R2R/ (GF->GF) and R2P/ (GF->FG) directories - Move existing benchmark families (cleaning_robots, multi_mode, railway_signaling, rotating_robots) into R2R/ - Create scalable R2P benchmark generators: stabilize_lock, forced_oscillation, alive_region_pruning with Python generators and shell scripts (i=1..10) - Add detailed README.md for each R2P example with i=1 and i=2 walkthroughs, full specification breakdowns, state space analysis, and winning strategies - Move generator scripts into their respective R2R/ or R2P/ directories, keep shared utilities (formulas.py, to_strix.py) at benchmarks/ level - Add test runner scripts: run_r2r_tests.sh, run_r2p_tests.sh, run_all_tests.sh - Fix all scripts for Python 3 compatibility and spaces-in-path quoting - Update to_strix.sh to accept R2R/R2P category argument - Update main README with new directory structure, generation and test instructions, and R2P fairness pattern documentation - Add one_example_walkthrough.md with detailed cleaning_robots walkthrough Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add .gitignore for build artifacts (bin/, obj/), generated parser files, .DS_Store, and tool output files - Add reference papers (FMCAD26_Separation.pdf, Adapting_Behaviors) - Add project_md.md with the R2P implementation plan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add R2P version of cleaning_robots benchmark (GF->FG fairness pattern), adapted from the R2R version with persistence guarantees - Add sgrk_r2p.py formatting helper for R2P specifications - Add cleaning_robots.py generator and shell script (i=1..10) - Add detailed README.md with i=1 and i=2 walkthroughs, R2R vs R2P comparison, and realizability argument - Add installation_changes_made.txt to .gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The scaling parameter i added no meaningful complexity — each bit was an independent copy of the same trivial contradiction. Removed the generator scripts and files 2-10, keeping only the single spec file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extends the synthesis tool to handle specifications where assumptions use persistence (FG) and guarantees use recurrence (GF). This is the dual of the existing R2P support. - Add P2R to ImplicationType enum - Add parser grammar rules for FG assumptions (in_persistences) - Generalize ComputeAliveRegion to work with both input and output primed variables, enabling environment alive region computation - Add P2R branches in ComputeCoveredRegion and ComputeCycleStrategy - Update mixed-type validation error message - Update README with P2R documentation - Fix test script grep pattern for new error message format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Introduce full P2R (persistence-to-recurrence, FG→GF) benchmark suite: - cleaning_robots: parameterized generator (1–10 rooms) with .sgrk outputs - responsive_toggle: parameterized generator (1–10 bits) with .sgrk outputs - forced_stability: single-instance benchmark - sgrk_p2r.py: P2R-specific formatting helper - run_p2r_tests.sh: realizability test runner with mixed-type rejection test Add Strix comparison data for all R2P benchmarks: - run_strix_r2p.sh: converts R2P .sgrk files to Strix format and measures runtime - strix_example/ directories with .sgrk.strix translations and runtime.txt for alive_region_pruning, cleaning_robots, forced_oscillation, stabilize_lock - Direct .sgrk.strix conversions at benchmark top level Add unified cross-tool benchmark infrastructure: - compare_results.sh: side-by-side sgrk vs Strix comparison with caching - run_benchmarks.sh: unified sgrk runner across R2R/R2P/P2R categories - run_strix.sh: unified Strix runner across all formula types Record runtime measurements for all existing benchmarks: - R2R: cleaning_robots, multi_mode, railway_signaling_2/3, rotating_robots - R2P: alive_region_pruning, cleaning_robots, forced_oscillation, stabilize_lock Update run_all_tests.sh to include P2R tests and mixed P2R/R2R rejection. Update README.md with P2R directory structure, generators, and test commands. Update .gitignore with thesis document exclusions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eration - Enhanced compare_results.sh: - Result caching: reuses existing runtime/strix_runtime files by default, only runs missing tests. Use --force to re-run everything. - Incremental saving: writes each result to disk immediately after completion, so partial runs are preserved if the process is interrupted. - LaTeX table generation: produces per-family .tex files and category summary.tex in benchmarks/results/<category>/. - Auto-generates SVG images and HTML slideshow via generate_table_images.py. - Default timeout set to 90 minutes for both sgrk and Strix. - Fixed bash/zsh compatibility (tr instead of ${,,} for case conversion). - New generate_table_images.py script: - Reads runtime/strix_runtime files and produces SVG comparison tables (per-family detail + category summary) using pure Python (no dependencies). - Generates an interactive HTML slideshow (benchmarks/results/slideshow.html) with keyboard navigation and clickable tabs. - Only generates tables for families that have both sgrk and Strix data. - Strix runtime results for R2P and P2R benchmarks: - R2P: 22 MATCH, 0 MISMATCH, 9 SKIP (Strix timeouts on larger instances). sgrk 5x-18,700x faster where both complete. - P2R: 15 MATCH, 0 MISMATCH, 6 SKIP (Strix timeouts on larger instances). sgrk 2x-13,000x faster where both complete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The committed R2P implementation computed alive regions independently per implication. This is incorrect when multiple R2P implications share output variables — their alive regions interact through the safety constraint, and per-implication regions can be non-empty while the joint region is empty. The fix: 1. Combines all R2P implications into a single demanded predicate: demanded(in, out) = ∧_i (¬can_cycle(A_i) ∨ g_i(out)) This captures the joint requirement conditioned on which assumptions the environment can satisfy from each input state. 2. Makes the alive region input-dependent via a new VarMgr operation OutputUnprimedToPrimed that substitutes only output variables to primed, leaving input variables as identity. The alive region becomes alive(in, out) — parameterized by input state. 3. Uses output-only quantification throughout the fixpoint and reachability checks, preserving the input parameterization. 4. Produces a single merged path strategy for all R2P implications instead of one per implication. This gives smaller BDDs during the fixpoint (n+m free variables instead of 2n+m) and requires no post-processing to unprime input variables. Also adds power_grid_3state benchmark (unrealizable R2P example where the system must commit to an absorbing output state before knowing which implication the environment will activate), P2R Strix translations, updated runtime results, and R2R Strix comparison data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three changes: 1. Refactor: Extract ComputeR2PAliveRegion shared method Steps 1-5 of the R2P algorithm (build demanded predicate, compute state-dependent alive region) were duplicated in ComputeCoveredRegion and ComputeCycleStrategy. Extracted into a single ComputeR2PAliveRegion method that both call. 2. Fix: Change AND to OR in ComputeR2PPathStrategy realizable region The two-phase R2P path strategy (reach alive, then maintain) had its realizable region computed as reach & maintain. Since maintain is only realizable at alive states, the AND collapsed to just alive — making the reach phase dead code. The cycle strategy would never activate it from states outside alive, falling back to the idle strategy instead. Changed to OR so that both phases contribute: the reach phase activates from states outside alive, the maintain phase from states inside. Regression test: benchmarks/R2P/strategy_bug/ (with --play verification). 3. Add NoC enforcement benchmarks (from industry NoC specifications) Three R2P examples derived from Network-on-Chip enforcement specs (QoS priority bounding, bus lock violation, BIST reconfiguration). noc_formulas_4_13 and noc_enforcement_no46 expose the per-implication alive region bug from the previous commit — the old code incorrectly said REALIZABLE, both Strix and the fix say UNREALIZABLE. Also adds: - benchmarks/verify_strategies.sh: plays realizable specs with worst-case inputs and checks FG guarantees stabilize / GF guarantees recur - R2P algorithm documentation (explaining_R2P_solution.md, explaining_R2P_committed.md) - P2R commitment_conflict base example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Avital Parasha and others added 10 commits April 5, 2026 20:55

Add R2P cleaning_robots to main README

554c51d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add R2P support with benchmarks and reorganize into R2R/R2P#3

Add R2P support with benchmarks and reorganize into R2R/R2P#3
avitalpresearch-droid wants to merge 10 commits into
lucasmt:masterfrom
avitalpresearch-droid:r2p-support

avitalpresearch-droid commented Apr 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avitalpresearch-droid commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Latest: Refactor R2P alive region, fix strategy reach phase, add NoC benchmarks

Previous: Fix R2P joint state-dependent alive region

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

avitalpresearch-droid commented Apr 5, 2026 •

edited

Loading