Add one-shot prompt generator for external LLM figure analysis by amahpour · Pull Request #17 · amahpour/datasheet-extractor

amahpour · 2026-03-17T05:12:58Z

Summary

This PR introduces an automated prompt generation system that creates self-contained markdown files for external LLM analysis of low-confidence figures. The system collects figures flagged by the local processor and organizes them into batches with explicit instructions for parallel processing and status reporting.

Key Changes

New module src/prompt_generator.py: Generates one-shot markdown prompts containing:
- All figures below the confidence threshold, grouped into batches of 20
- Absolute image paths for direct use with vision-capable LLMs (Claude, Codex, etc.)
- Instructions for parallel processing and batch-level status updates
- Optional integration with analysis schema templates
- Output paths for saving results back to the analysis directory
Refactored confidence scoring in src/local_processor.py:
- Replaced binary SIMPLE_TYPES/COMPLEX_TYPES classification logic with a continuous _compute_confidence() function
- Introduced CONFIDENCE_THRESHOLD (0.7) to determine which figures need external analysis
- Confidence scoring now considers description length and classification specificity
- Updated docstrings to clarify the distinction between local resolution and external queuing
Integrated prompt generation into pipeline (src/pipeline.py):
- Added call to write_prompt() after local figure processing completes
- Generates external_analysis_prompt.md in the PDF output directory
- Logs the prompt path for user reference
Updated documentation (prompts/figure_analysis.md):
- Simplified batch usage instructions to reference auto-generated prompts
- Removed manual workflow details in favor of pipeline-generated prompts
Comprehensive test coverage (tests/test_prompt_generator.py):
- Tests for empty figure sets, batching logic, filtering, and file I/O
- Validates inclusion of absolute paths, subagent instructions, and template integration

Implementation Details

Figures are loaded from processing/fig_*.json status files and filtered by status == "needs_external"
Batches are created with explicit size limits to enable parallel subagent processing
The prompt includes Moondream's local classification and confidence scores for context
Output paths are pre-computed so external LLMs can save results directly to the analysis directory structure
The system gracefully handles cases where no figures need external analysis (returns None)

https://claude.ai/code/session_0139BXTEb42aXeukRzsaxyEQ

…prompts Remove SIMPLE_TYPES/COMPLEX_TYPES tier logic from local_processor.py in favor of a single confidence threshold. Add prompt_generator.py that collects low-confidence figures after Moondream processing and generates a one-shot markdown prompt with absolute image paths, batched in groups of 20, with instructions for status updates and parallel subagent usage. The generated prompt is written to out/<pdf>/external_analysis_prompt.md and can be handed directly to Claude or Codex. https://claude.ai/code/session_0139BXTEb42aXeukRzsaxyEQ

https://claude.ai/code/session_0139BXTEb42aXeukRzsaxyEQ

claude added 2 commits March 17, 2026 04:49

chore: update uv.lock after dependency resolution

67b79ef

https://claude.ai/code/session_0139BXTEb42aXeukRzsaxyEQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add one-shot prompt generator for external LLM figure analysis#17

Add one-shot prompt generator for external LLM figure analysis#17
amahpour wants to merge 2 commits into
mainfrom
claude/review-commit-history-1gGVh

amahpour commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amahpour commented Mar 17, 2026

Summary

Key Changes

Implementation Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants