Skip to content

Conversation

@ppinchuk
Copy link
Collaborator

Add missing components, including LLM generated keywords, heuristic, and text extraction.

@ppinchuk ppinchuk self-assigned this Feb 12, 2026
@ppinchuk ppinchuk requested a review from castelao as a code owner February 12, 2026 01:21
@ppinchuk ppinchuk added the enhancement Update to logic or general code improvements label Feb 12, 2026
Copilot AI review requested due to automatic review settings February 12, 2026 01:21
@ppinchuk ppinchuk added the new computation Update that adds a new computation method label Feb 12, 2026
@ppinchuk ppinchuk added p-critical Priority: critical topic-python-general Issues/pull requests related to python labels Feb 12, 2026
@codecov-commenter
Copy link

codecov-commenter commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 18.00000% with 246 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.85%. Comparing base (b5460fb) to head (46ec65b).

Files with missing lines Patch % Lines
compass/plugin/one_shot/base.py 11.18% 143 Missing ⚠️
compass/plugin/one_shot/generators.py 16.66% 50 Missing ⚠️
compass/plugin/one_shot/cache.py 22.22% 42 Missing ⚠️
compass/plugin/one_shot/components.py 50.00% 9 Missing ⚠️
compass/plugin/interface.py 33.33% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (18.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #379      +/-   ##
==========================================
- Coverage   56.42%   54.85%   -1.58%     
==========================================
  Files          60       61       +1     
  Lines        5366     5589     +223     
  Branches      484      525      +41     
==========================================
+ Hits         3028     3066      +38     
- Misses       2292     2477     +185     
  Partials       46       46              
Flag Coverage Δ
unittests 54.85% <18.00%> (-1.58%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes missing parts of the one-shot schema extraction plugin by adding LLM-generated website keywords, heuristic keyword generation, and schema-based text extraction, and adds a CLI example for running COMPASS against known local documents.

Changes:

  • Add LLM-driven generators + caching for query templates, website keywords, and heuristic keyword lists in the one-shot plugin.
  • Implement schema-based text extraction (structured-output) and update plugin/extractor call paths accordingly.
  • Add a CLI “parse existing docs” example and wire it into the Sphinx examples index.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
examples/parse_existing_docs/CLI/local_docs_minimal.json5 Adds minimal local-doc mapping example (currently references a non-existent PDF filename).
examples/parse_existing_docs/CLI/local_docs.json5 Adds fuller local-doc mapping example with metadata fields.
examples/parse_existing_docs/CLI/jurisdictions.csv Adds sample jurisdictions input for the local-docs CLI run.
examples/parse_existing_docs/CLI/config.json5 Adds sample run config demonstrating known_local_docs + disabled search.
examples/parse_existing_docs/CLI/README.rst Adds CLI walkthrough for processing local docs (contains a couple typos).
examples/one_shot_schema_extraction/plugin_config_simple.json5 Updates config option name + enables heuristic keyword auto-generation.
examples/one_shot_schema_extraction/plugin_config.yaml Refreshes website keywords and adds heuristic keyword lists example.
examples/one_shot_schema_extraction/README.rst Updates option name and fixes a doc link.
docs/source/examples/index.rst Adds the “parse existing docs via CLI” example to the docs toctree.
compass/services/threaded.py Adjusts jurisdiction document info dumping (currently breaks filename reporting for local docs).
compass/plugin/ordinance.py Refactors text extractors to be direct LLM callers; updates usage labeling + call path.
compass/plugin/one_shot/schemas/website_keywords.json5 Adds schema for LLM-generated website keyword weights.
compass/plugin/one_shot/schemas/heuristic_keywords.json5 Adds schema for LLM-generated heuristic keyword lists.
compass/plugin/one_shot/schemas/extract_text.json5 Adds schema for structured-output text extraction (verbatim or null).
compass/plugin/one_shot/generators.py Adds website keyword + heuristic keyword generators and keyword normalization/deduping.
compass/plugin/one_shot/components.py Implements schema-based text extractor/collector components (has a prompt typo).
compass/plugin/one_shot/cache.py Adds a disk cache for LLM-generated content (hashing is not stable).
compass/plugin/one_shot/base.py Wires in new generators, caching, heuristic support, and schema-based text extraction.
compass/plugin/noop.py Removes legacy llm_caller init pattern for NoOp text extractor.
compass/plugin/interface.py Updates text extraction instantiation and uses async get_heuristic() in filtering.
compass/extraction/apply.py Improves attempt-count logging format for ngram-checked extraction retries.

@ppinchuk ppinchuk merged commit 701f9f9 into main Feb 12, 2026
18 checks passed
@ppinchuk ppinchuk deleted the pp/finish_one_shot branch February 12, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Update to logic or general code improvements new computation Update that adds a new computation method p-critical Priority: critical topic-python-general Issues/pull requests related to python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants