Add P2R, R2P, and Mixed implication types with benchmark suite#4
Open
AvitalParasha wants to merge 12 commits into
Open
Add P2R, R2P, and Mixed implication types with benchmark suite#4AvitalParasha wants to merge 12 commits into
AvitalParasha wants to merge 12 commits into
Conversation
- Restructure benchmarks/ into R2R/ (GF->GF) and R2P/ (GF->FG) directories - Move existing benchmark families (cleaning_robots, multi_mode, railway_signaling, rotating_robots) into R2R/ - Create scalable R2P benchmark generators: stabilize_lock, forced_oscillation, alive_region_pruning with Python generators and shell scripts (i=1..10) - Add detailed README.md for each R2P example with i=1 and i=2 walkthroughs, full specification breakdowns, state space analysis, and winning strategies - Move generator scripts into their respective R2R/ or R2P/ directories, keep shared utilities (formulas.py, to_strix.py) at benchmarks/ level - Add test runner scripts: run_r2r_tests.sh, run_r2p_tests.sh, run_all_tests.sh - Fix all scripts for Python 3 compatibility and spaces-in-path quoting - Update to_strix.sh to accept R2R/R2P category argument - Update main README with new directory structure, generation and test instructions, and R2P fairness pattern documentation - Add one_example_walkthrough.md with detailed cleaning_robots walkthrough Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add .gitignore for build artifacts (bin/, obj/), generated parser files, .DS_Store, and tool output files - Add reference papers (FMCAD26_Separation.pdf, Adapting_Behaviors) - Add project_md.md with the R2P implementation plan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add R2P version of cleaning_robots benchmark (GF->FG fairness pattern), adapted from the R2R version with persistence guarantees - Add sgrk_r2p.py formatting helper for R2P specifications - Add cleaning_robots.py generator and shell script (i=1..10) - Add detailed README.md with i=1 and i=2 walkthroughs, R2R vs R2P comparison, and realizability argument - Add installation_changes_made.txt to .gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The scaling parameter i added no meaningful complexity — each bit was an independent copy of the same trivial contradiction. Removed the generator scripts and files 2-10, keeping only the single spec file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the synthesis tool to handle specifications where assumptions use persistence (FG) and guarantees use recurrence (GF). This is the dual of the existing R2P support. - Add P2R to ImplicationType enum - Add parser grammar rules for FG assumptions (in_persistences) - Generalize ComputeAliveRegion to work with both input and output primed variables, enabling environment alive region computation - Add P2R branches in ComputeCoveredRegion and ComputeCycleStrategy - Update mixed-type validation error message - Update README with P2R documentation - Fix test script grep pattern for new error message format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce full P2R (persistence-to-recurrence, FG→GF) benchmark suite: - cleaning_robots: parameterized generator (1–10 rooms) with .sgrk outputs - responsive_toggle: parameterized generator (1–10 bits) with .sgrk outputs - forced_stability: single-instance benchmark - sgrk_p2r.py: P2R-specific formatting helper - run_p2r_tests.sh: realizability test runner with mixed-type rejection test Add Strix comparison data for all R2P benchmarks: - run_strix_r2p.sh: converts R2P .sgrk files to Strix format and measures runtime - strix_example/ directories with .sgrk.strix translations and runtime.txt for alive_region_pruning, cleaning_robots, forced_oscillation, stabilize_lock - Direct .sgrk.strix conversions at benchmark top level Add unified cross-tool benchmark infrastructure: - compare_results.sh: side-by-side sgrk vs Strix comparison with caching - run_benchmarks.sh: unified sgrk runner across R2R/R2P/P2R categories - run_strix.sh: unified Strix runner across all formula types Record runtime measurements for all existing benchmarks: - R2R: cleaning_robots, multi_mode, railway_signaling_2/3, rotating_robots - R2P: alive_region_pruning, cleaning_robots, forced_oscillation, stabilize_lock Update run_all_tests.sh to include P2R tests and mixed P2R/R2R rejection. Update README.md with P2R directory structure, generators, and test commands. Update .gitignore with thesis document exclusions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eration
- Enhanced compare_results.sh:
- Result caching: reuses existing runtime/strix_runtime files by default,
only runs missing tests. Use --force to re-run everything.
- Incremental saving: writes each result to disk immediately after completion,
so partial runs are preserved if the process is interrupted.
- LaTeX table generation: produces per-family .tex files and category
summary.tex in benchmarks/results/<category>/.
- Auto-generates SVG images and HTML slideshow via generate_table_images.py.
- Default timeout set to 90 minutes for both sgrk and Strix.
- Fixed bash/zsh compatibility (tr instead of ${,,} for case conversion).
- New generate_table_images.py script:
- Reads runtime/strix_runtime files and produces SVG comparison tables
(per-family detail + category summary) using pure Python (no dependencies).
- Generates an interactive HTML slideshow (benchmarks/results/slideshow.html)
with keyboard navigation and clickable tabs.
- Only generates tables for families that have both sgrk and Strix data.
- Strix runtime results for R2P and P2R benchmarks:
- R2P: 22 MATCH, 0 MISMATCH, 9 SKIP (Strix timeouts on larger instances).
sgrk 5x-18,700x faster where both complete.
- P2R: 15 MATCH, 0 MISMATCH, 6 SKIP (Strix timeouts on larger instances).
sgrk 2x-13,000x faster where both complete.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The committed R2P implementation computed alive regions independently per implication. This is incorrect when multiple R2P implications share output variables — their alive regions interact through the safety constraint, and per-implication regions can be non-empty while the joint region is empty. The fix: 1. Combines all R2P implications into a single demanded predicate: demanded(in, out) = ∧_i (¬can_cycle(A_i) ∨ g_i(out)) This captures the joint requirement conditioned on which assumptions the environment can satisfy from each input state. 2. Makes the alive region input-dependent via a new VarMgr operation OutputUnprimedToPrimed that substitutes only output variables to primed, leaving input variables as identity. The alive region becomes alive(in, out) — parameterized by input state. 3. Uses output-only quantification throughout the fixpoint and reachability checks, preserving the input parameterization. 4. Produces a single merged path strategy for all R2P implications instead of one per implication. This gives smaller BDDs during the fixpoint (n+m free variables instead of 2n+m) and requires no post-processing to unprime input variables. Also adds power_grid_3state benchmark (unrealizable R2P example where the system must commit to an absorbing output state before knowing which implication the environment will activate), P2R Strix translations, updated runtime results, and R2R Strix comparison data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three changes: 1. Refactor: Extract ComputeR2PAliveRegion shared method Steps 1-5 of the R2P algorithm (build demanded predicate, compute state-dependent alive region) were duplicated in ComputeCoveredRegion and ComputeCycleStrategy. Extracted into a single ComputeR2PAliveRegion method that both call. 2. Fix: Change AND to OR in ComputeR2PPathStrategy realizable region The two-phase R2P path strategy (reach alive, then maintain) had its realizable region computed as reach & maintain. Since maintain is only realizable at alive states, the AND collapsed to just alive — making the reach phase dead code. The cycle strategy would never activate it from states outside alive, falling back to the idle strategy instead. Changed to OR so that both phases contribute: the reach phase activates from states outside alive, the maintain phase from states inside. Regression test: benchmarks/R2P/strategy_bug/ (with --play verification). 3. Add NoC enforcement benchmarks (from industry NoC specifications) Three R2P examples derived from Network-on-Chip enforcement specs (QoS priority bounding, bus lock violation, BIST reconfiguration). noc_formulas_4_13 and noc_enforcement_no46 expose the per-implication alive region bug from the previous commit — the old code incorrectly said REALIZABLE, both Strix and the fix say UNREALIZABLE. Also adds: - benchmarks/verify_strategies.sh: plays realizable specs with worst-case inputs and checks FG guarantees stabilize / GF guarantees recur - R2P algorithm documentation (explaining_R2P_solution.md, explaining_R2P_committed.md) - P2R commitment_conflict base example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Spot (ltlsynt) as third comparison tool alongside Strix - Add NoC router benchmark families for R2R, R2P, and P2R - Add new benchmarks: rotating_robots (R2R), sensor_network (P2R), commitment_conflict (P2R), strategy_bug (R2P) - Refactor compare_results.sh with Spot support and improved caching - Enhance generate_table_images.py with per-family SVG and summary tables - Add run_single_family.sh, run_spot.sh, compare_results_HPC.sh scripts - Update runtime/strix_runtime caches with latest measurements - Remove obsolete noc_enforcement benchmark - Regenerate all SVG result tables Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement mixed R2R+{R2P,P2R} solving in CycleCover, reorganize mixed
benchmarks into their own category (Mixed/), and update all runner
scripts to include the Mixed category in benchmarking and comparison.
Include updated benchmark results, NoC scaling SVGs, and Mixed results.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Key changes
CycleCover.cpp, mixed R2R+R2P/P2R solving, joint state-dependent alive region for multiple implicationscompare_results_HPC.sh(Slurm job orchestration),generate_table_images.py(SVG tables + HTML slideshow), per-tool runtime comparison with speedup calculationTest plan
makecompiles cleanly on Linux (HPC)bash benchmarks/run_all_tests.shpasses all categories including Mixedbash benchmarks/compare_results_HPC.sh --force allruns full Slurm comparison without mismatches🤖 Generated with Claude Code