Implement runtime validation for extraction CLI auto page breaks#111
Implement runtime validation for extraction CLI auto page breaks#111
Conversation
📝 WalkthroughWalkthroughCOM availability probing was removed from parser construction; Changes
Sequence DiagramsequenceDiagram
participant User as User/CLI
participant Parser as Parser Builder
participant Help as Help Output
participant Executor as Runtime Executor
participant COM as COM Availability Checker
User->>Parser: exstruct --help
Parser->>Parser: build parser (no COM probe)
Parser->>Help: include --auto-page-breaks-dir
Help-->>User: show help
User->>Parser: exstruct --auto-page-breaks-dir DIR --mode <mode>
Parser->>Executor: hand off args
Executor->>Executor: _validate_auto_page_breaks_request(args)
alt mode == "libreoffice"
Executor->>Executor: call validate_libreoffice_process_request(..., include_auto_page_breaks=True)
Executor-->>User: fail early (libreoffice incompat)
else mode == "light"
Executor-->>User: fail (requires standard/verbose + Excel COM)
else mode in ["standard","verbose"]
Executor->>COM: get_com_availability()
COM-->>Executor: available / reason
alt COM available
Executor->>Executor: proceed with auto page-break export
Executor-->>User: success
else COM unavailable
Executor-->>User: fail (COM required + reason)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR updates the extraction CLI to avoid Excel COM probing during parser/help construction by moving capability checks for --auto-page-breaks-dir to runtime, and documents the resulting stable CLI contract across hosts (including a new ADR).
Changes:
- Make
build_parser()side-effect free by always registering--auto-page-breaks-dirand removing parser-time COM availability probing. - Add runtime validation for
--auto-page-breaks-dir(explicit rejection forlight, fast-fail LibreOffice combinations, and clear COM-unavailable errors forstandard/verbose). - Update tests and user/internal documentation (README, CLI docs, spec, ADR index/map) to reflect the always-visible flag + runtime validation policy.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/exstruct/cli/main.py | Removes parser-time COM probing; adds runtime validation for auto page-break export; wires validation into main(). |
| src/exstruct/cli/availability.py | Clarifies docstring to emphasize runtime usage. |
| tests/cli/test_cli.py | Adds regressions ensuring help does not probe COM; adds runtime validation coverage and LibreOffice fast-fail assertions. |
| tests/cli/test_edit_cli.py | Updates help test to no longer require injecting COM availability into parser construction. |
| docs/cli.md | Documents always-visible --auto-page-breaks-dir and runtime validation behavior. |
| README.md | Updates examples/prose to reflect always-visible flag + execution-time validation. |
| README.ja.md | Keeps Japanese README in parity with updated README behavior/wording. |
| dev-docs/specs/excel-extraction.md | Records internal guarantee that CLI validates auto page-break export at runtime. |
| dev-docs/adr/ADR-0008-extraction-cli-runtime-capability-validation.md | Adds ADR formalizing “no parser-time capability probing” policy and runtime validation approach. |
| dev-docs/adr/index.yaml | Registers ADR-0008 in the ADR index. |
| dev-docs/adr/decision-map.md | Adds ADR-0008 to the decision map categories. |
| dev-docs/adr/README.md | Adds ADR-0008 to the ADR table. |
| tasks/todo.md | Captures implementation/review notes and verification steps for issue #107. |
| tasks/feature_spec.md | Adds issue #107 working spec + follow-up contract notes and verification plan. |
Comments suppressed due to low confidence (1)
src/exstruct/cli/main.py:41
- The
--auto-page-breaks-dirhelp text is now slightly misleading/incomplete relative to the new runtime validation: it says only “not supported in libreoffice mode”, but the CLI also rejects--mode light, and it doesn’t mention that the output format follows--format(unlike--print-areas-dir). Consider updating the argument help string to explicitly state it requires--mode standard|verbosewith Excel COM and that the per-area files use the selected--format.
parser.add_argument(
"--auto-page-breaks-dir",
type=Path,
help=(
"Optional directory to write one file per auto page-break area "
"(Excel COM only; not supported in libreoffice mode)."
),
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/cli/test_cli.py (1)
370-435: Please add a LibreOffice.xlsregression case.These new CLI-side rejection tests only exercise
.xlsx. Since this path now depends onvalidate_libreoffice_process_request()to preserve the existing.xlsValueErrorprecedence, I’d add one case for--mode libreoffice --auto-page-breaks-dir ...against an.xlsinput so that contract stays pinned at the CLI layer too.Based on learnings: In
src/exstruct/constraints.py, the internal helper_validate_libreoffice_file_pathintentionally raisesValueError(notConfigError) for.xlsfiles in libreoffice mode. This is consistent with the current spec, implementation, and existing tests.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/cli/test_cli.py` around lines 370 - 435, Add a regression test mirroring test_cli_libreoffice_rejects_auto_page_breaks_dir that uses an .xls input to ensure CLI-level validation preserves the .xls ValueError precedence (the helper _validate_libreoffice_file_path in src/exstruct/constraints.py should still trigger). Create a new test (e.g. test_cli_libreoffice_rejects_auto_page_breaks_dir_on_xls) that monkeypatches exstruct.cli.main.process_excel and get_com_availability to raise if called, prepares an .xls file (use or add a helper like _prepare_sample_xls or write a minimal .xls Path), invokes _run_cli with ["<xls-path>", "--mode", "libreoffice", "--auto-page-breaks-dir", "<dir>"], asserts result.returncode != 0, and checks combined stdout/stderr contains the .xls-specific error text emitted by _validate_libreoffice_file_path (so the CLI rejects before probing COM or calling process_excel).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/cli/test_cli.py`:
- Around line 370-435: Add a regression test mirroring
test_cli_libreoffice_rejects_auto_page_breaks_dir that uses an .xls input to
ensure CLI-level validation preserves the .xls ValueError precedence (the helper
_validate_libreoffice_file_path in src/exstruct/constraints.py should still
trigger). Create a new test (e.g.
test_cli_libreoffice_rejects_auto_page_breaks_dir_on_xls) that monkeypatches
exstruct.cli.main.process_excel and get_com_availability to raise if called,
prepares an .xls file (use or add a helper like _prepare_sample_xls or write a
minimal .xls Path), invokes _run_cli with ["<xls-path>", "--mode",
"libreoffice", "--auto-page-breaks-dir", "<dir>"], asserts result.returncode !=
0, and checks combined stdout/stderr contains the .xls-specific error text
emitted by _validate_libreoffice_file_path (so the CLI rejects before probing
COM or calling process_excel).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 360e1739-4d58-431d-9c39-009951fdca49
📒 Files selected for processing (5)
dev-docs/adr/ADR-0008-extraction-cli-runtime-capability-validation.mdsrc/exstruct/cli/main.pytasks/feature_spec.mdtasks/todo.mdtests/cli/test_cli.py
✅ Files skipped from review due to trivial changes (3)
- dev-docs/adr/ADR-0008-extraction-cli-runtime-capability-validation.md
- tasks/todo.md
- tasks/feature_spec.md
Summary
--auto-page-breaks-dirin help and validate it only at runtimelibreofficeauto page-break requests in the CLI while preserving existing combined error precedenceTesting
Closes #107
Summary by CodeRabbit
Documentation
--auto-page-breaks-diris always shown in help; guidance now requires--mode standardor--mode verbosewith Excel COM for auto page-break export;mode=libreoffice/mode=lightrejection rules clarified.Bug Fixes
Tests