Add tested examples, improvement plan, and updated docs for all Python use cases#2
Draft
Copilot wants to merge 5 commits into
Draft
Add tested examples, improvement plan, and updated docs for all Python use cases#2Copilot wants to merge 5 commits into
Copilot wants to merge 5 commits into
Conversation
Open
- Add 6 new example files covering all Python use cases: guard_verification.py, chain_of_thought.py, arithmetic_solver.py, syllogism_verification.py, mcq_picker.py, arithmetic_repair.py - Add tests/test_examples.py with 36 tests validating all examples - Rewrite examples/README.md with per-use-case documentation - Create docs/IMPROVEMENT-PLAN.md with findings and roadmap - Update README.md with accurate API references, install docs, and links to new examples and improvement plan
Copilot
AI
changed the title
[WIP] Update use cases with detailed report and examples
Add tested examples, improvement plan, and updated docs for all Python use cases
Jun 2, 2026
Owner
|
@copilot Fix issues in CI. |
Author
Fixed all CI issues:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add 6 tested example files covering every Python API surface, rewrite examples/README.md, and create a phased improvement plan based on hands-on exploration.
Type
Description
The existing examples were generic and untested —
simple_verification.pyshowed 3 hardcoded claims, and the README referencedverify_chain(llm_output)with the wrong call signature (missingstepsparam). New consumers had no way to understand the full API surface or validate their setup.New examples (
examples/)guard_verification.py—ReasoningGuardECS scoring, threshold tuning, arithmetic repair, degradation trackingchain_of_thought.py—verify_chainmulti-step reasoning with contradiction detectionarithmetic_solver.py—solve_arithmetic,_safe_eval,_extract_numbers,_detect_operationsyllogism_verification.py—verify_syllogismwith Z3 + heuristic fallacy detectionmcq_picker.py—pick_best_answerwith strict/lenient tie handlingarithmetic_repair.py—_repair_arithmetic_in_step, answer extraction, majority voteTests
tests/test_examples.py— 36 tests covering all example use cases (mocked where Rust binary needed, direct for pure-Python)Docs
examples/README.md— Full rewrite: per-use-case sections with code, expected output, edge cases, install prerequisites per example groupREADME.md— Fixed wrongverify_chainsignature, added optional dependency install docs, linked new examples tabledocs/IMPROVEMENT-PLAN.md— Findings from exploration + 5-phase roadmap (robustness → test coverage → API ergonomics → performance)Bugs fixed
verify_chain(llm_output)— correct signature isverify_chain(problem, steps)result.ecsonEpistemicChainReport(doesn't exist; it'sresult.chain_confidence)CI fixes
fprefix fromf" ---"inexamples/syllogism_verification.pyreport→_reportintests/test_examples.pyexamples/arithmetic_solver.py,examples/guard_verification.py, andtests/test_examples.pyclickdependency to CI install step in.github/workflows/lint.yml(required by spacy but not resolved transitively)Testing
python3 -m unittest discover -s tests -p "test_*.py" -vruff check . && ruff format --check .cargo testpython3 benchmarks/run_benchmarks.pyBenchmark impact (detection PRs only)
N/A — no detection logic changed.
Checklist
///for Rust).github/CONTRIBUTING.mdconsulted for code style guidelinesRelated issues