Skip to content

Implement runtime validation for extraction CLI auto page breaks#111

Merged
harumiWeb merged 2 commits intomainfrom
feat/cli-optimize
Mar 20, 2026
Merged

Implement runtime validation for extraction CLI auto page breaks#111
harumiWeb merged 2 commits intomainfrom
feat/cli-optimize

Conversation

@harumiWeb
Copy link
Copy Markdown
Owner

@harumiWeb harumiWeb commented Mar 20, 2026

Summary

  • stop probing COM availability while building the extraction CLI parser
  • always show --auto-page-breaks-dir in help and validate it only at runtime
  • fast-fail invalid libreoffice auto page-break requests in the CLI while preserving existing combined error precedence

Testing

  • uv run pytest tests/cli/test_cli.py tests/cli/test_edit_cli.py -q
  • uv run pytest tests/cli/test_cli.py -q
  • uv run task build-docs
  • uv run task precommit-run

Closes #107


Open with Devin

Summary by CodeRabbit

  • Documentation

    • CLI and README updated: --auto-page-breaks-dir is always shown in help; guidance now requires --mode standard or --mode verbose with Excel COM for auto page-break export; mode=libreoffice/mode=light rejection rules clarified.
  • Bug Fixes

    • Capability checks deferred to execution time; clearer runtime error messages when COM is unavailable or mode is incompatible.
  • Tests

    • CLI tests updated to assert help visibility and ensure no COM probing during parser/help.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

COM availability probing was removed from parser construction; --auto-page-breaks-dir is always registered in help. Validation for that flag is performed at execution time with mode-specific rules: light rejects it, libreoffice uses existing rejection, and standard/verbose require Excel COM or fail at runtime.

Changes

Cohort / File(s) Summary
CLI Documentation Updates
README.md, README.ja.md, docs/cli.md
Document --auto-page-breaks-dir as always shown in help; clarify runtime validation and mode-specific rejection rules (light, libreoffice, standard/verbose + Excel COM).
ADR & Specs
dev-docs/adr/ADR-0008-extraction-cli-runtime-capability-validation.md, dev-docs/adr/README.md, dev-docs/adr/decision-map.md, dev-docs/adr/index.yaml, dev-docs/specs/excel-extraction.md
Add ADR-0008 and register it; record policy to avoid COM probing at parser construction and require execution-time validation for COM-dependent flags; update excel-extraction spec accordingly.
Feature Spec & Tasks
tasks/feature_spec.md, tasks/todo.md
Add issue #107 spec and TODOs describing parser-side effect removal, runtime validation contract, and test expectations.
Core CLI Implementation
src/exstruct/cli/main.py, src/exstruct/cli/availability.py
Remove COM-availability gating from parser builder; always add --auto-page-breaks-dir; add _validate_auto_page_breaks_request() invoked at runtime to enforce mode-specific rules and query COM when needed; update availability docstring.
Tests
tests/cli/test_cli.py, tests/cli/test_edit_cli.py
Adjust tests to assert --auto-page-breaks-dir is always present and that help generation does not call COM checks; add tests for mode=light rejection and runtime COM-unavailable failure messages; simplify parser construction in edit-help test.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant Parser as Parser Builder
    participant Help as Help Output
    participant Executor as Runtime Executor
    participant COM as COM Availability Checker

    User->>Parser: exstruct --help
    Parser->>Parser: build parser (no COM probe)
    Parser->>Help: include --auto-page-breaks-dir
    Help-->>User: show help

    User->>Parser: exstruct --auto-page-breaks-dir DIR --mode <mode>
    Parser->>Executor: hand off args
    Executor->>Executor: _validate_auto_page_breaks_request(args)
    alt mode == "libreoffice"
        Executor->>Executor: call validate_libreoffice_process_request(..., include_auto_page_breaks=True)
        Executor-->>User: fail early (libreoffice incompat)
    else mode == "light"
        Executor-->>User: fail (requires standard/verbose + Excel COM)
    else mode in ["standard","verbose"]
        Executor->>COM: get_com_availability()
        COM-->>Executor: available / reason
        alt COM available
            Executor->>Executor: proceed with auto page-break export
            Executor-->>User: success
        else COM unavailable
            Executor-->>User: fail (COM required + reason)
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Feat/edit mcp #57: Modifies CLI parser and argument forwarding in src/exstruct/cli/main.py (related to parser construction changes).
  • Add libreoffice extraction mode #76: Adjusts auto-page-breaks handling and mode-specific validation logic in src/exstruct/cli/main.py (directly related).

Suggested labels

enhancement

Poem

🐰 A quiet parser hops and hides,
No Excel spawn at help-time tides.
Flags stay listed, patience keeps—
Validation wakes when runtime peeps.
Hooray, swift starts and peaceful strides! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main objective of the PR: implementing runtime validation for the extraction CLI's auto page breaks feature.
Description check ✅ Passed The description provides a clear summary of changes, includes testing commands, and links to the related issue (#107), though it does not follow the repository's template structure.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from issue #107: COM probing removed from parser construction, --auto-page-breaks-dir always exposed in CLI, runtime validation implemented with clear error messages for unsupported mode/environment combinations.
Out of Scope Changes check ✅ Passed All changes are directly related to issue #107 objectives: CLI startup optimization through deferred COM validation. Documentation, tests, ADRs, and code changes all support the core objective of removing COM probing from parser construction.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cli-optimize
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

coderabbitai[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the extraction CLI to avoid Excel COM probing during parser/help construction by moving capability checks for --auto-page-breaks-dir to runtime, and documents the resulting stable CLI contract across hosts (including a new ADR).

Changes:

  • Make build_parser() side-effect free by always registering --auto-page-breaks-dir and removing parser-time COM availability probing.
  • Add runtime validation for --auto-page-breaks-dir (explicit rejection for light, fast-fail LibreOffice combinations, and clear COM-unavailable errors for standard/verbose).
  • Update tests and user/internal documentation (README, CLI docs, spec, ADR index/map) to reflect the always-visible flag + runtime validation policy.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/exstruct/cli/main.py Removes parser-time COM probing; adds runtime validation for auto page-break export; wires validation into main().
src/exstruct/cli/availability.py Clarifies docstring to emphasize runtime usage.
tests/cli/test_cli.py Adds regressions ensuring help does not probe COM; adds runtime validation coverage and LibreOffice fast-fail assertions.
tests/cli/test_edit_cli.py Updates help test to no longer require injecting COM availability into parser construction.
docs/cli.md Documents always-visible --auto-page-breaks-dir and runtime validation behavior.
README.md Updates examples/prose to reflect always-visible flag + execution-time validation.
README.ja.md Keeps Japanese README in parity with updated README behavior/wording.
dev-docs/specs/excel-extraction.md Records internal guarantee that CLI validates auto page-break export at runtime.
dev-docs/adr/ADR-0008-extraction-cli-runtime-capability-validation.md Adds ADR formalizing “no parser-time capability probing” policy and runtime validation approach.
dev-docs/adr/index.yaml Registers ADR-0008 in the ADR index.
dev-docs/adr/decision-map.md Adds ADR-0008 to the decision map categories.
dev-docs/adr/README.md Adds ADR-0008 to the ADR table.
tasks/todo.md Captures implementation/review notes and verification steps for issue #107.
tasks/feature_spec.md Adds issue #107 working spec + follow-up contract notes and verification plan.
Comments suppressed due to low confidence (1)

src/exstruct/cli/main.py:41

  • The --auto-page-breaks-dir help text is now slightly misleading/incomplete relative to the new runtime validation: it says only “not supported in libreoffice mode”, but the CLI also rejects --mode light, and it doesn’t mention that the output format follows --format (unlike --print-areas-dir). Consider updating the argument help string to explicitly state it requires --mode standard|verbose with Excel COM and that the per-area files use the selected --format.
    parser.add_argument(
        "--auto-page-breaks-dir",
        type=Path,
        help=(
            "Optional directory to write one file per auto page-break area "
            "(Excel COM only; not supported in libreoffice mode)."
        ),

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/cli/test_cli.py (1)

370-435: Please add a LibreOffice .xls regression case.

These new CLI-side rejection tests only exercise .xlsx. Since this path now depends on validate_libreoffice_process_request() to preserve the existing .xls ValueError precedence, I’d add one case for --mode libreoffice --auto-page-breaks-dir ... against an .xls input so that contract stays pinned at the CLI layer too.

Based on learnings: In src/exstruct/constraints.py, the internal helper _validate_libreoffice_file_path intentionally raises ValueError (not ConfigError) for .xls files in libreoffice mode. This is consistent with the current spec, implementation, and existing tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/cli/test_cli.py` around lines 370 - 435, Add a regression test
mirroring test_cli_libreoffice_rejects_auto_page_breaks_dir that uses an .xls
input to ensure CLI-level validation preserves the .xls ValueError precedence
(the helper _validate_libreoffice_file_path in src/exstruct/constraints.py
should still trigger). Create a new test (e.g.
test_cli_libreoffice_rejects_auto_page_breaks_dir_on_xls) that monkeypatches
exstruct.cli.main.process_excel and get_com_availability to raise if called,
prepares an .xls file (use or add a helper like _prepare_sample_xls or write a
minimal .xls Path), invokes _run_cli with ["<xls-path>", "--mode",
"libreoffice", "--auto-page-breaks-dir", "<dir>"], asserts result.returncode !=
0, and checks combined stdout/stderr contains the .xls-specific error text
emitted by _validate_libreoffice_file_path (so the CLI rejects before probing
COM or calling process_excel).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/cli/test_cli.py`:
- Around line 370-435: Add a regression test mirroring
test_cli_libreoffice_rejects_auto_page_breaks_dir that uses an .xls input to
ensure CLI-level validation preserves the .xls ValueError precedence (the helper
_validate_libreoffice_file_path in src/exstruct/constraints.py should still
trigger). Create a new test (e.g.
test_cli_libreoffice_rejects_auto_page_breaks_dir_on_xls) that monkeypatches
exstruct.cli.main.process_excel and get_com_availability to raise if called,
prepares an .xls file (use or add a helper like _prepare_sample_xls or write a
minimal .xls Path), invokes _run_cli with ["<xls-path>", "--mode",
"libreoffice", "--auto-page-breaks-dir", "<dir>"], asserts result.returncode !=
0, and checks combined stdout/stderr contains the .xls-specific error text
emitted by _validate_libreoffice_file_path (so the CLI rejects before probing
COM or calling process_excel).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 360e1739-4d58-431d-9c39-009951fdca49

📥 Commits

Reviewing files that changed from the base of the PR and between 86aabfb and 9c96af0.

📒 Files selected for processing (5)
  • dev-docs/adr/ADR-0008-extraction-cli-runtime-capability-validation.md
  • src/exstruct/cli/main.py
  • tasks/feature_spec.md
  • tasks/todo.md
  • tests/cli/test_cli.py
✅ Files skipped from review due to trivial changes (3)
  • dev-docs/adr/ADR-0008-extraction-cli-runtime-capability-validation.md
  • tasks/todo.md
  • tasks/feature_spec.md

@harumiWeb harumiWeb merged commit 993afbd into main Mar 20, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CLI startup optimization: avoid COM availability probing during parser construction

2 participants