Skip to content

Add R2P support with benchmarks and reorganize into R2R/R2P#3

Open
avitalpresearch-droid wants to merge 10 commits into
lucasmt:masterfrom
avitalpresearch-droid:r2p-support
Open

Add R2P support with benchmarks and reorganize into R2R/R2P#3
avitalpresearch-droid wants to merge 10 commits into
lucasmt:masterfrom
avitalpresearch-droid:r2p-support

Conversation

@avitalpresearch-droid
Copy link
Copy Markdown

@avitalpresearch-droid avitalpresearch-droid commented Apr 5, 2026

Latest: Refactor R2P alive region, fix strategy reach phase, add NoC benchmarks

1. Refactor: Extract ComputeR2PAliveRegion shared method
Steps 1–5 of the R2P algorithm (build joint demanded predicate, compute state-dependent alive region) were duplicated in ComputeCoveredRegion and ComputeCycleStrategy. Extracted into a single ComputeR2PAliveRegion method that both call.

2. Fix: AND → OR in ComputeR2PPathStrategy realizable region
The two-phase R2P path strategy (reach alive, then maintain) had its realizable region computed as reach & maintain. Since maintain is only realizable at alive states, the AND collapsed to just alive — making the reach phase dead code. The cycle strategy would never activate it from states outside alive, falling back to idle instead of driving toward alive. Changed to OR so both phases contribute.

Verified via --play: with AND, the system stays at (x=0, y=0) forever (idle, FG(y) violated); with OR, it reaches (x=0, y=1) by step 2 and stabilizes (FG(y) satisfied).

3. Add NoC enforcement benchmarks
Three R2P examples derived from industry Network-on-Chip enforcement specifications (QoS priority bounding, bus lock violation, BIST reconfiguration, fault containment). noc_formulas_4_13 and noc_enforcement_no46 expose the per-implication alive region bug — old code said REALIZABLE, both Strix and the fix say UNREALIZABLE.

4. Strategy verification script (benchmarks/verify_strategies.sh)
For each realizable spec, plays the game with worst-case inputs and checks that FG guarantees stabilize and GF guarantees recur. Catches the reach-phase bug: strategy_bug.sgrk FAIL with AND, PASS with OR.


Previous: Fix R2P joint state-dependent alive region

The committed R2P implementation computed alive regions independently per
implication. This is incorrect when multiple R2P implications share output
variables — their alive regions interact through the safety constraint, and
per-implication regions can be non-empty while the joint region is empty.

The fix:

  1. Combines all R2P implications into a single demanded predicate:
    demanded(in, out) = ∧_i (¬can_cycle(A_i) ∨ g_i(out))
  2. Makes the alive region input-dependent via OutputUnprimedToPrimed
    (primes only output vars, leaves input as identity)
  3. Uses output-only quantification throughout the fixpoint
  4. Produces a single merged path strategy for all R2P implications

Gives smaller BDDs (n+m free variables instead of 2n+m) and requires
no post-processing to unprime input variables.


Summary

  • Reorganize benchmarks into R2R, R2P, P2R directories
  • Add R2P support: parser, CycleCover, mixed-type rejection
  • Fix R2P alive region: joint state-dependent computation
  • Fix R2P strategy: AND → OR in path strategy realizable region
  • Refactor: shared ComputeR2PAliveRegion method
  • Add P2R support: parser, CycleCover
  • Benchmarks: stabilize_lock, forced_oscillation, alive_region_pruning, cleaning_robots, power_grid, noc_enforcement, strategy_bug (R2P); cleaning_robots, responsive_toggle, forced_stability, commitment_conflict (P2R)
  • Strix comparison: full infrastructure with caching, LaTeX/SVG tables
  • Strategy verification: verify_strategies.sh for play-based regression testing

Test plan

  • R2P: 40 passed, 0 failed (including noc_enforcement unrealizable + strategy_bug realizable)
  • R2R/P2R: all pass
  • Mixed type specs correctly rejected
  • R2P Strix comparison: sgrk matches Strix on all examples
  • noc_formulas_4_13: old code REALIZABLE (wrong), fix UNREALIZABLE (matches Strix)
  • strategy_bug: old code strategy stuck at idle, fix reaches alive by step 2
  • verify_strategies.sh: strategy_bug PASS after fix, FAIL with old AND

🤖 Generated with Claude Code

Avital Parasha and others added 10 commits April 5, 2026 20:55
- Restructure benchmarks/ into R2R/ (GF->GF) and R2P/ (GF->FG) directories
- Move existing benchmark families (cleaning_robots, multi_mode,
  railway_signaling, rotating_robots) into R2R/
- Create scalable R2P benchmark generators: stabilize_lock, forced_oscillation,
  alive_region_pruning with Python generators and shell scripts (i=1..10)
- Add detailed README.md for each R2P example with i=1 and i=2 walkthroughs,
  full specification breakdowns, state space analysis, and winning strategies
- Move generator scripts into their respective R2R/ or R2P/ directories,
  keep shared utilities (formulas.py, to_strix.py) at benchmarks/ level
- Add test runner scripts: run_r2r_tests.sh, run_r2p_tests.sh, run_all_tests.sh
- Fix all scripts for Python 3 compatibility and spaces-in-path quoting
- Update to_strix.sh to accept R2R/R2P category argument
- Update main README with new directory structure, generation and test
  instructions, and R2P fairness pattern documentation
- Add one_example_walkthrough.md with detailed cleaning_robots walkthrough

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add .gitignore for build artifacts (bin/, obj/), generated parser
  files, .DS_Store, and tool output files
- Add reference papers (FMCAD26_Separation.pdf, Adapting_Behaviors)
- Add project_md.md with the R2P implementation plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add R2P version of cleaning_robots benchmark (GF->FG fairness pattern),
  adapted from the R2R version with persistence guarantees
- Add sgrk_r2p.py formatting helper for R2P specifications
- Add cleaning_robots.py generator and shell script (i=1..10)
- Add detailed README.md with i=1 and i=2 walkthroughs, R2R vs R2P
  comparison, and realizability argument
- Add installation_changes_made.txt to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The scaling parameter i added no meaningful complexity — each bit was
an independent copy of the same trivial contradiction. Removed the
generator scripts and files 2-10, keeping only the single spec file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the synthesis tool to handle specifications where assumptions
use persistence (FG) and guarantees use recurrence (GF). This is the
dual of the existing R2P support.

- Add P2R to ImplicationType enum
- Add parser grammar rules for FG assumptions (in_persistences)
- Generalize ComputeAliveRegion to work with both input and output
  primed variables, enabling environment alive region computation
- Add P2R branches in ComputeCoveredRegion and ComputeCycleStrategy
- Update mixed-type validation error message
- Update README with P2R documentation
- Fix test script grep pattern for new error message format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce full P2R (persistence-to-recurrence, FG→GF) benchmark suite:
- cleaning_robots: parameterized generator (1–10 rooms) with .sgrk outputs
- responsive_toggle: parameterized generator (1–10 bits) with .sgrk outputs
- forced_stability: single-instance benchmark
- sgrk_p2r.py: P2R-specific formatting helper
- run_p2r_tests.sh: realizability test runner with mixed-type rejection test

Add Strix comparison data for all R2P benchmarks:
- run_strix_r2p.sh: converts R2P .sgrk files to Strix format and measures runtime
- strix_example/ directories with .sgrk.strix translations and runtime.txt for
  alive_region_pruning, cleaning_robots, forced_oscillation, stabilize_lock
- Direct .sgrk.strix conversions at benchmark top level

Add unified cross-tool benchmark infrastructure:
- compare_results.sh: side-by-side sgrk vs Strix comparison with caching
- run_benchmarks.sh: unified sgrk runner across R2R/R2P/P2R categories
- run_strix.sh: unified Strix runner across all formula types

Record runtime measurements for all existing benchmarks:
- R2R: cleaning_robots, multi_mode, railway_signaling_2/3, rotating_robots
- R2P: alive_region_pruning, cleaning_robots, forced_oscillation, stabilize_lock

Update run_all_tests.sh to include P2R tests and mixed P2R/R2R rejection.
Update README.md with P2R directory structure, generators, and test commands.
Update .gitignore with thesis document exclusions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eration

- Enhanced compare_results.sh:
  - Result caching: reuses existing runtime/strix_runtime files by default,
    only runs missing tests. Use --force to re-run everything.
  - Incremental saving: writes each result to disk immediately after completion,
    so partial runs are preserved if the process is interrupted.
  - LaTeX table generation: produces per-family .tex files and category
    summary.tex in benchmarks/results/<category>/.
  - Auto-generates SVG images and HTML slideshow via generate_table_images.py.
  - Default timeout set to 90 minutes for both sgrk and Strix.
  - Fixed bash/zsh compatibility (tr instead of ${,,} for case conversion).

- New generate_table_images.py script:
  - Reads runtime/strix_runtime files and produces SVG comparison tables
    (per-family detail + category summary) using pure Python (no dependencies).
  - Generates an interactive HTML slideshow (benchmarks/results/slideshow.html)
    with keyboard navigation and clickable tabs.
  - Only generates tables for families that have both sgrk and Strix data.

- Strix runtime results for R2P and P2R benchmarks:
  - R2P: 22 MATCH, 0 MISMATCH, 9 SKIP (Strix timeouts on larger instances).
    sgrk 5x-18,700x faster where both complete.
  - P2R: 15 MATCH, 0 MISMATCH, 6 SKIP (Strix timeouts on larger instances).
    sgrk 2x-13,000x faster where both complete.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The committed R2P implementation computed alive regions independently per
implication. This is incorrect when multiple R2P implications share output
variables — their alive regions interact through the safety constraint, and
per-implication regions can be non-empty while the joint region is empty.

The fix:
1. Combines all R2P implications into a single demanded predicate:
   demanded(in, out) = ∧_i (¬can_cycle(A_i) ∨ g_i(out))
   This captures the joint requirement conditioned on which assumptions
   the environment can satisfy from each input state.

2. Makes the alive region input-dependent via a new VarMgr operation
   OutputUnprimedToPrimed that substitutes only output variables to
   primed, leaving input variables as identity. The alive region becomes
   alive(in, out) — parameterized by input state.

3. Uses output-only quantification throughout the fixpoint and
   reachability checks, preserving the input parameterization.

4. Produces a single merged path strategy for all R2P implications
   instead of one per implication.

This gives smaller BDDs during the fixpoint (n+m free variables instead
of 2n+m) and requires no post-processing to unprime input variables.

Also adds power_grid_3state benchmark (unrealizable R2P example where
the system must commit to an absorbing output state before knowing which
implication the environment will activate), P2R Strix translations,
updated runtime results, and R2R Strix comparison data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three changes:

1. Refactor: Extract ComputeR2PAliveRegion shared method
   Steps 1-5 of the R2P algorithm (build demanded predicate, compute
   state-dependent alive region) were duplicated in ComputeCoveredRegion
   and ComputeCycleStrategy. Extracted into a single ComputeR2PAliveRegion
   method that both call.

2. Fix: Change AND to OR in ComputeR2PPathStrategy realizable region
   The two-phase R2P path strategy (reach alive, then maintain) had its
   realizable region computed as reach & maintain. Since maintain is only
   realizable at alive states, the AND collapsed to just alive — making
   the reach phase dead code. The cycle strategy would never activate it
   from states outside alive, falling back to the idle strategy instead.
   Changed to OR so that both phases contribute: the reach phase activates
   from states outside alive, the maintain phase from states inside.
   Regression test: benchmarks/R2P/strategy_bug/ (with --play verification).

3. Add NoC enforcement benchmarks (from industry NoC specifications)
   Three R2P examples derived from Network-on-Chip enforcement specs
   (QoS priority bounding, bus lock violation, BIST reconfiguration).
   noc_formulas_4_13 and noc_enforcement_no46 expose the per-implication
   alive region bug from the previous commit — the old code incorrectly
   said REALIZABLE, both Strix and the fix say UNREALIZABLE.

Also adds:
- benchmarks/verify_strategies.sh: plays realizable specs with worst-case
  inputs and checks FG guarantees stabilize / GF guarantees recur
- R2P algorithm documentation (explaining_R2P_solution.md, explaining_R2P_committed.md)
- P2R commitment_conflict base example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant