-
Notifications
You must be signed in to change notification settings - Fork 3
Finish one shot plugin implementation #379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (18.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #379 +/- ##
==========================================
- Coverage 56.42% 54.85% -1.58%
==========================================
Files 60 61 +1
Lines 5366 5589 +223
Branches 484 525 +41
==========================================
+ Hits 3028 3066 +38
- Misses 2292 2477 +185
Partials 46 46
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR completes missing parts of the one-shot schema extraction plugin by adding LLM-generated website keywords, heuristic keyword generation, and schema-based text extraction, and adds a CLI example for running COMPASS against known local documents.
Changes:
- Add LLM-driven generators + caching for query templates, website keywords, and heuristic keyword lists in the one-shot plugin.
- Implement schema-based text extraction (structured-output) and update plugin/extractor call paths accordingly.
- Add a CLI “parse existing docs” example and wire it into the Sphinx examples index.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/parse_existing_docs/CLI/local_docs_minimal.json5 | Adds minimal local-doc mapping example (currently references a non-existent PDF filename). |
| examples/parse_existing_docs/CLI/local_docs.json5 | Adds fuller local-doc mapping example with metadata fields. |
| examples/parse_existing_docs/CLI/jurisdictions.csv | Adds sample jurisdictions input for the local-docs CLI run. |
| examples/parse_existing_docs/CLI/config.json5 | Adds sample run config demonstrating known_local_docs + disabled search. |
| examples/parse_existing_docs/CLI/README.rst | Adds CLI walkthrough for processing local docs (contains a couple typos). |
| examples/one_shot_schema_extraction/plugin_config_simple.json5 | Updates config option name + enables heuristic keyword auto-generation. |
| examples/one_shot_schema_extraction/plugin_config.yaml | Refreshes website keywords and adds heuristic keyword lists example. |
| examples/one_shot_schema_extraction/README.rst | Updates option name and fixes a doc link. |
| docs/source/examples/index.rst | Adds the “parse existing docs via CLI” example to the docs toctree. |
| compass/services/threaded.py | Adjusts jurisdiction document info dumping (currently breaks filename reporting for local docs). |
| compass/plugin/ordinance.py | Refactors text extractors to be direct LLM callers; updates usage labeling + call path. |
| compass/plugin/one_shot/schemas/website_keywords.json5 | Adds schema for LLM-generated website keyword weights. |
| compass/plugin/one_shot/schemas/heuristic_keywords.json5 | Adds schema for LLM-generated heuristic keyword lists. |
| compass/plugin/one_shot/schemas/extract_text.json5 | Adds schema for structured-output text extraction (verbatim or null). |
| compass/plugin/one_shot/generators.py | Adds website keyword + heuristic keyword generators and keyword normalization/deduping. |
| compass/plugin/one_shot/components.py | Implements schema-based text extractor/collector components (has a prompt typo). |
| compass/plugin/one_shot/cache.py | Adds a disk cache for LLM-generated content (hashing is not stable). |
| compass/plugin/one_shot/base.py | Wires in new generators, caching, heuristic support, and schema-based text extraction. |
| compass/plugin/noop.py | Removes legacy llm_caller init pattern for NoOp text extractor. |
| compass/plugin/interface.py | Updates text extraction instantiation and uses async get_heuristic() in filtering. |
| compass/extraction/apply.py | Improves attempt-count logging format for ngram-checked extraction retries. |
Add missing components, including LLM generated keywords, heuristic, and text extraction.