Skip to content

Add one-shot prompt generator for external LLM figure analysis#17

Open
amahpour wants to merge 2 commits into
mainfrom
claude/review-commit-history-1gGVh
Open

Add one-shot prompt generator for external LLM figure analysis#17
amahpour wants to merge 2 commits into
mainfrom
claude/review-commit-history-1gGVh

Conversation

@amahpour

Copy link
Copy Markdown
Owner

Summary

This PR introduces an automated prompt generation system that creates self-contained markdown files for external LLM analysis of low-confidence figures. The system collects figures flagged by the local processor and organizes them into batches with explicit instructions for parallel processing and status reporting.

Key Changes

  • New module src/prompt_generator.py: Generates one-shot markdown prompts containing:

    • All figures below the confidence threshold, grouped into batches of 20
    • Absolute image paths for direct use with vision-capable LLMs (Claude, Codex, etc.)
    • Instructions for parallel processing and batch-level status updates
    • Optional integration with analysis schema templates
    • Output paths for saving results back to the analysis directory
  • Refactored confidence scoring in src/local_processor.py:

    • Replaced binary SIMPLE_TYPES/COMPLEX_TYPES classification logic with a continuous _compute_confidence() function
    • Introduced CONFIDENCE_THRESHOLD (0.7) to determine which figures need external analysis
    • Confidence scoring now considers description length and classification specificity
    • Updated docstrings to clarify the distinction between local resolution and external queuing
  • Integrated prompt generation into pipeline (src/pipeline.py):

    • Added call to write_prompt() after local figure processing completes
    • Generates external_analysis_prompt.md in the PDF output directory
    • Logs the prompt path for user reference
  • Updated documentation (prompts/figure_analysis.md):

    • Simplified batch usage instructions to reference auto-generated prompts
    • Removed manual workflow details in favor of pipeline-generated prompts
  • Comprehensive test coverage (tests/test_prompt_generator.py):

    • Tests for empty figure sets, batching logic, filtering, and file I/O
    • Validates inclusion of absolute paths, subagent instructions, and template integration

Implementation Details

  • Figures are loaded from processing/fig_*.json status files and filtered by status == "needs_external"
  • Batches are created with explicit size limits to enable parallel subagent processing
  • The prompt includes Moondream's local classification and confidence scores for context
  • Output paths are pre-computed so external LLMs can save results directly to the analysis directory structure
  • The system gracefully handles cases where no figures need external analysis (returns None)

https://claude.ai/code/session_0139BXTEb42aXeukRzsaxyEQ

claude added 2 commits March 17, 2026 04:49
…prompts

Remove SIMPLE_TYPES/COMPLEX_TYPES tier logic from local_processor.py in
favor of a single confidence threshold. Add prompt_generator.py that
collects low-confidence figures after Moondream processing and generates
a one-shot markdown prompt with absolute image paths, batched in groups
of 20, with instructions for status updates and parallel subagent usage.
The generated prompt is written to out/<pdf>/external_analysis_prompt.md
and can be handed directly to Claude or Codex.

https://claude.ai/code/session_0139BXTEb42aXeukRzsaxyEQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants