Skip to content

Add P2R, R2P, and Mixed implication types with benchmark suite#4

Open
AvitalParasha wants to merge 12 commits into
lucasmt:masterfrom
AvitalParasha:mixed-types
Open

Add P2R, R2P, and Mixed implication types with benchmark suite#4
AvitalParasha wants to merge 12 commits into
lucasmt:masterfrom
AvitalParasha:mixed-types

Conversation

@AvitalParasha
Copy link
Copy Markdown

Summary

  • Implement P2R (persistence-to-recurrence, FG→GF) and mixed implication type support in the CycleCover solver
  • Add comprehensive benchmark suite with scalable generators (NoC routers, cleaning robots, power grid, railway signaling, etc.) across R2R, R2P, P2R, and Mixed categories
  • Add automated comparison infrastructure against Strix and Spot (ltlsynt), with Slurm-based parallel execution on HPC and SVG/HTML result visualization

Key changes

  • Solver: P2R support in CycleCover.cpp, mixed R2R+R2P/P2R solving, joint state-dependent alive region for multiple implications
  • Benchmarks: 4 categories (R2R, R2P, P2R, Mixed), each with multiple families and scalable instance generators (Python scripts)
  • Infrastructure: compare_results_HPC.sh (Slurm job orchestration), generate_table_images.py (SVG tables + HTML slideshow), per-tool runtime comparison with speedup calculation

Test plan

  • make compiles cleanly on Linux (HPC)
  • bash benchmarks/run_all_tests.sh passes all categories including Mixed
  • bash benchmarks/compare_results_HPC.sh --force all runs full Slurm comparison without mismatches
  • HTML slideshow renders correctly with all categories

🤖 Generated with Claude Code

Avital Parasha and others added 12 commits April 5, 2026 20:55
- Restructure benchmarks/ into R2R/ (GF->GF) and R2P/ (GF->FG) directories
- Move existing benchmark families (cleaning_robots, multi_mode,
  railway_signaling, rotating_robots) into R2R/
- Create scalable R2P benchmark generators: stabilize_lock, forced_oscillation,
  alive_region_pruning with Python generators and shell scripts (i=1..10)
- Add detailed README.md for each R2P example with i=1 and i=2 walkthroughs,
  full specification breakdowns, state space analysis, and winning strategies
- Move generator scripts into their respective R2R/ or R2P/ directories,
  keep shared utilities (formulas.py, to_strix.py) at benchmarks/ level
- Add test runner scripts: run_r2r_tests.sh, run_r2p_tests.sh, run_all_tests.sh
- Fix all scripts for Python 3 compatibility and spaces-in-path quoting
- Update to_strix.sh to accept R2R/R2P category argument
- Update main README with new directory structure, generation and test
  instructions, and R2P fairness pattern documentation
- Add one_example_walkthrough.md with detailed cleaning_robots walkthrough

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add .gitignore for build artifacts (bin/, obj/), generated parser
  files, .DS_Store, and tool output files
- Add reference papers (FMCAD26_Separation.pdf, Adapting_Behaviors)
- Add project_md.md with the R2P implementation plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add R2P version of cleaning_robots benchmark (GF->FG fairness pattern),
  adapted from the R2R version with persistence guarantees
- Add sgrk_r2p.py formatting helper for R2P specifications
- Add cleaning_robots.py generator and shell script (i=1..10)
- Add detailed README.md with i=1 and i=2 walkthroughs, R2R vs R2P
  comparison, and realizability argument
- Add installation_changes_made.txt to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The scaling parameter i added no meaningful complexity — each bit was
an independent copy of the same trivial contradiction. Removed the
generator scripts and files 2-10, keeping only the single spec file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the synthesis tool to handle specifications where assumptions
use persistence (FG) and guarantees use recurrence (GF). This is the
dual of the existing R2P support.

- Add P2R to ImplicationType enum
- Add parser grammar rules for FG assumptions (in_persistences)
- Generalize ComputeAliveRegion to work with both input and output
  primed variables, enabling environment alive region computation
- Add P2R branches in ComputeCoveredRegion and ComputeCycleStrategy
- Update mixed-type validation error message
- Update README with P2R documentation
- Fix test script grep pattern for new error message format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce full P2R (persistence-to-recurrence, FG→GF) benchmark suite:
- cleaning_robots: parameterized generator (1–10 rooms) with .sgrk outputs
- responsive_toggle: parameterized generator (1–10 bits) with .sgrk outputs
- forced_stability: single-instance benchmark
- sgrk_p2r.py: P2R-specific formatting helper
- run_p2r_tests.sh: realizability test runner with mixed-type rejection test

Add Strix comparison data for all R2P benchmarks:
- run_strix_r2p.sh: converts R2P .sgrk files to Strix format and measures runtime
- strix_example/ directories with .sgrk.strix translations and runtime.txt for
  alive_region_pruning, cleaning_robots, forced_oscillation, stabilize_lock
- Direct .sgrk.strix conversions at benchmark top level

Add unified cross-tool benchmark infrastructure:
- compare_results.sh: side-by-side sgrk vs Strix comparison with caching
- run_benchmarks.sh: unified sgrk runner across R2R/R2P/P2R categories
- run_strix.sh: unified Strix runner across all formula types

Record runtime measurements for all existing benchmarks:
- R2R: cleaning_robots, multi_mode, railway_signaling_2/3, rotating_robots
- R2P: alive_region_pruning, cleaning_robots, forced_oscillation, stabilize_lock

Update run_all_tests.sh to include P2R tests and mixed P2R/R2R rejection.
Update README.md with P2R directory structure, generators, and test commands.
Update .gitignore with thesis document exclusions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eration

- Enhanced compare_results.sh:
  - Result caching: reuses existing runtime/strix_runtime files by default,
    only runs missing tests. Use --force to re-run everything.
  - Incremental saving: writes each result to disk immediately after completion,
    so partial runs are preserved if the process is interrupted.
  - LaTeX table generation: produces per-family .tex files and category
    summary.tex in benchmarks/results/<category>/.
  - Auto-generates SVG images and HTML slideshow via generate_table_images.py.
  - Default timeout set to 90 minutes for both sgrk and Strix.
  - Fixed bash/zsh compatibility (tr instead of ${,,} for case conversion).

- New generate_table_images.py script:
  - Reads runtime/strix_runtime files and produces SVG comparison tables
    (per-family detail + category summary) using pure Python (no dependencies).
  - Generates an interactive HTML slideshow (benchmarks/results/slideshow.html)
    with keyboard navigation and clickable tabs.
  - Only generates tables for families that have both sgrk and Strix data.

- Strix runtime results for R2P and P2R benchmarks:
  - R2P: 22 MATCH, 0 MISMATCH, 9 SKIP (Strix timeouts on larger instances).
    sgrk 5x-18,700x faster where both complete.
  - P2R: 15 MATCH, 0 MISMATCH, 6 SKIP (Strix timeouts on larger instances).
    sgrk 2x-13,000x faster where both complete.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The committed R2P implementation computed alive regions independently per
implication. This is incorrect when multiple R2P implications share output
variables — their alive regions interact through the safety constraint, and
per-implication regions can be non-empty while the joint region is empty.

The fix:
1. Combines all R2P implications into a single demanded predicate:
   demanded(in, out) = ∧_i (¬can_cycle(A_i) ∨ g_i(out))
   This captures the joint requirement conditioned on which assumptions
   the environment can satisfy from each input state.

2. Makes the alive region input-dependent via a new VarMgr operation
   OutputUnprimedToPrimed that substitutes only output variables to
   primed, leaving input variables as identity. The alive region becomes
   alive(in, out) — parameterized by input state.

3. Uses output-only quantification throughout the fixpoint and
   reachability checks, preserving the input parameterization.

4. Produces a single merged path strategy for all R2P implications
   instead of one per implication.

This gives smaller BDDs during the fixpoint (n+m free variables instead
of 2n+m) and requires no post-processing to unprime input variables.

Also adds power_grid_3state benchmark (unrealizable R2P example where
the system must commit to an absorbing output state before knowing which
implication the environment will activate), P2R Strix translations,
updated runtime results, and R2R Strix comparison data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three changes:

1. Refactor: Extract ComputeR2PAliveRegion shared method
   Steps 1-5 of the R2P algorithm (build demanded predicate, compute
   state-dependent alive region) were duplicated in ComputeCoveredRegion
   and ComputeCycleStrategy. Extracted into a single ComputeR2PAliveRegion
   method that both call.

2. Fix: Change AND to OR in ComputeR2PPathStrategy realizable region
   The two-phase R2P path strategy (reach alive, then maintain) had its
   realizable region computed as reach & maintain. Since maintain is only
   realizable at alive states, the AND collapsed to just alive — making
   the reach phase dead code. The cycle strategy would never activate it
   from states outside alive, falling back to the idle strategy instead.
   Changed to OR so that both phases contribute: the reach phase activates
   from states outside alive, the maintain phase from states inside.
   Regression test: benchmarks/R2P/strategy_bug/ (with --play verification).

3. Add NoC enforcement benchmarks (from industry NoC specifications)
   Three R2P examples derived from Network-on-Chip enforcement specs
   (QoS priority bounding, bus lock violation, BIST reconfiguration).
   noc_formulas_4_13 and noc_enforcement_no46 expose the per-implication
   alive region bug from the previous commit — the old code incorrectly
   said REALIZABLE, both Strix and the fix say UNREALIZABLE.

Also adds:
- benchmarks/verify_strategies.sh: plays realizable specs with worst-case
  inputs and checks FG guarantees stabilize / GF guarantees recur
- R2P algorithm documentation (explaining_R2P_solution.md, explaining_R2P_committed.md)
- P2R commitment_conflict base example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Spot (ltlsynt) as third comparison tool alongside Strix
- Add NoC router benchmark families for R2R, R2P, and P2R
- Add new benchmarks: rotating_robots (R2R), sensor_network (P2R),
  commitment_conflict (P2R), strategy_bug (R2P)
- Refactor compare_results.sh with Spot support and improved caching
- Enhance generate_table_images.py with per-family SVG and summary tables
- Add run_single_family.sh, run_spot.sh, compare_results_HPC.sh scripts
- Update runtime/strix_runtime caches with latest measurements
- Remove obsolete noc_enforcement benchmark
- Regenerate all SVG result tables

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement mixed R2R+{R2P,P2R} solving in CycleCover, reorganize mixed
benchmarks into their own category (Mixed/), and update all runner
scripts to include the Mixed category in benchmarking and comparison.
Include updated benchmark results, NoC scaling SVGs, and Mixed results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant