Skip to content

Add tested examples, improvement plan, and updated docs for all Python use cases#2

Draft
Copilot wants to merge 5 commits into
mainfrom
copilot/explore-use-cases
Draft

Add tested examples, improvement plan, and updated docs for all Python use cases#2
Copilot wants to merge 5 commits into
mainfrom
copilot/explore-use-cases

Conversation

Copy link
Copy Markdown

Copilot AI commented Jun 2, 2026

Summary

Add 6 tested example files covering every Python API surface, rewrite examples/README.md, and create a phased improvement plan based on hands-on exploration.

Type

  • Bug fix
  • New feature / enhancement
  • Detection improvement (prior, oracle, heuristic)
  • Refactor / cleanup
  • Documentation
  • CI / tooling

Description

The existing examples were generic and untested — simple_verification.py showed 3 hardcoded claims, and the README referenced verify_chain(llm_output) with the wrong call signature (missing steps param). New consumers had no way to understand the full API surface or validate their setup.

New examples (examples/)

  • guard_verification.pyReasoningGuard ECS scoring, threshold tuning, arithmetic repair, degradation tracking
  • chain_of_thought.pyverify_chain multi-step reasoning with contradiction detection
  • arithmetic_solver.pysolve_arithmetic, _safe_eval, _extract_numbers, _detect_operation
  • syllogism_verification.pyverify_syllogism with Z3 + heuristic fallacy detection
  • mcq_picker.pypick_best_answer with strict/lenient tie handling
  • arithmetic_repair.py_repair_arithmetic_in_step, answer extraction, majority vote

Tests

  • tests/test_examples.py — 36 tests covering all example use cases (mocked where Rust binary needed, direct for pure-Python)

Docs

  • examples/README.md — Full rewrite: per-use-case sections with code, expected output, edge cases, install prerequisites per example group
  • README.md — Fixed wrong verify_chain signature, added optional dependency install docs, linked new examples table
  • docs/IMPROVEMENT-PLAN.md — Findings from exploration + 5-phase roadmap (robustness → test coverage → API ergonomics → performance)

Bugs fixed

  • README showed verify_chain(llm_output) — correct signature is verify_chain(problem, steps)
  • README referenced result.ecs on EpistemicChainReport (doesn't exist; it's result.chain_confidence)

CI fixes

  • Fixed ruff F541 lint error: removed extraneous f prefix from f" ---" in examples/syllogism_verification.py
  • Fixed ruff RUF059 lint error: renamed unused unpacked variable report_report in tests/test_examples.py
  • Fixed ruff formatting issues in examples/arithmetic_solver.py, examples/guard_verification.py, and tests/test_examples.py
  • Added missing click dependency to CI install step in .github/workflows/lint.yml (required by spacy but not resolved transitively)

Testing

  • Python unit tests pass: python3 -m unittest discover -s tests -p "test_*.py" -v
  • Ruff clean: ruff check . && ruff format --check .
  • Rust tests pass (if Rust code changed): cargo test
  • Benchmark regression checked (if detection logic changed):
    python3 benchmarks/run_benchmarks.py

Benchmark impact (detection PRs only)

N/A — no detection logic changed.

Checklist

  • No LLM, API key, or external model dependency introduced
  • All Python files ≤ 400 lines (exceptions need justification)
  • Public API documented (docstrings / /// for Rust)
  • No commented-out code left in
  • .github/CONTRIBUTING.md consulted for code style guidelines

Related issues

Copilot AI linked an issue Jun 2, 2026 that may be closed by this pull request
Copilot AI added 3 commits June 2, 2026 07:18
- Add 6 new example files covering all Python use cases:
  guard_verification.py, chain_of_thought.py, arithmetic_solver.py,
  syllogism_verification.py, mcq_picker.py, arithmetic_repair.py
- Add tests/test_examples.py with 36 tests validating all examples
- Rewrite examples/README.md with per-use-case documentation
- Create docs/IMPROVEMENT-PLAN.md with findings and roadmap
- Update README.md with accurate API references, install docs,
  and links to new examples and improvement plan
Copilot AI changed the title [WIP] Update use cases with detailed report and examples Add tested examples, improvement plan, and updated docs for all Python use cases Jun 2, 2026
Copilot AI requested a review from sorunokoe June 2, 2026 07:27
@sorunokoe
Copy link
Copy Markdown
Owner

@copilot Fix issues in CI.

Copy link
Copy Markdown
Author

Copilot AI commented Jun 2, 2026

@copilot Fix issues in CI.

Fixed all CI issues:

  1. Ruff F541 — removed extraneous f prefix from f" ---" in examples/syllogism_verification.py:109
  2. Ruff RUF059 — renamed unused report to _report in tests/test_examples.py:311
  3. Ruff format — auto-formatted 3 files that had formatting drift
  4. Missing click — added click to the CI install step in .github/workflows/lint.yml (spacy requires it but wasn't resolving it transitively)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explore use cases

2 participants