PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
-
Updated
May 5, 2026 - Python
PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
Extract structured data from local or remote LLM models
Reproducible diagnostic investigation of a fine-tuned SLM that scored 99.75% on evaluation and failed silently on 10% of production inputs. Full pipeline. Every number verified.
Claude Code Skill for structured information extraction from code/docs/logs. 6-step Python pipeline (source grounding, dedup, confidence scoring, entity resolution, relation inference, KG injection). Zero dependencies, no API keys. Replaces LangExtract.
A simple llm library
news-summizr extracts structured summaries from headlines, labeling key points like announcement, products, region for quick insight.
Collection of purpose-built MCP servers for AI agent workflows.
A new package is designed to facilitate structured, reliable extraction of key insights from user-provided texts about cultural topics. It accepts a text input, such as an article or discussion prompt
Automated research paper analysis: PDF → JSON with evidence extraction using LLMs (DeepSeek, Gemma). Extracts methods, results, datasets, and claims with precise evidence grounding.
Turn tutorial videos into structured specs — Pine Script, recipes, code walkthroughs
Auditable LLM extraction for Java: structured output with source citations, PDF bounding boxes, confidence, provenance, and audit JSON.
Structured Data Agent Pack for AI coding agents: AGENTS.md, CLAUDE.md, evals, and pitfall recovery for vitalops/datatune.
Multilingual structured OCR (11+ languages, CJK-tuned) — MCP server with verified per-character bboxes for AI agents
Human-in-the-loop LLM orchestration with structured signal extraction and session persistence. Annotate confusion and curiosity—feedback shapes responses, topology accumulates over time. API-first design, no gamification. FastAPI + Claude + SQLite + D3.
Source content for Vstorm blog posts—carefully crafted to provide both depth and clarity, with practical insights readers can apply immediately.
AI-assisted PDF/DOCX packet structuring workflow with source citations, semantic retrieval, deterministic validation, and reviewer-facing run sheets.
AI-agent-driven venue governance database. Extracts editorial boards and program committees from journal websites using local LLMs, with entity resolution against OpenAlex.
Evaluate local LLM accuracy on structured data extraction. Tests models' ability to extract JSON from unstructured text with ground-truth comparison, F1 scoring, and fuzzy matching. Supports MLX and Ollama backends. Generates interactive reports with charts and per-model analysis.
Robust extraction of structured signals from messy unstructured text. Hybrid LLM + tool-use schema + source span linking + eval harness.
Add a description, image, and links to the structured-extraction topic page so that developers can more easily learn about it.
To associate your repository with the structured-extraction topic, visit your repo's landing page and select "manage topics."