Fix analysis runtime and document review workflow#11
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the document review + analysis workflow across the FastAPI backend and Vue web UI: analysis is started server-side on upload, document metadata is enriched from extracted pages, PDF preview switches to PDFium-rendered page images, and OpenAI-compatible structured output handling is hardened for DeepSeek/Ollama. It also standardizes Docker host port bindings and aligns helper scripts/docs accordingly.
Changes:
- Start analysis automatically on
POST /v1/documents(prevents orphan queued tasks) and improve task state persistence/visibility. - Enrich document list metadata from extracted
pages.json, and add a/pages/{n}/imageendpoint + UI PDFium image preview. - Improve table browsing UX (search + merged-cell rendering) and extend OpenAI-compatible client fallback behavior + tests.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| web/src/views/UploadView.vue | Stops calling analyze() client-side; relies on server-side auto-start. |
| web/src/components/TablesPanel.vue | Adds table search + merged-cell grid rendering and stats display. |
| web/src/components/PdfViewer.vue | Replaces iframe PDF preview with PDFium-rendered page images + paging toolbar. |
| web/src/api/types.ts | Extends ExtractedTable and introduces ExtractedPage for new endpoints/UI needs. |
| web/src/api/docs.ts | Adds pages() and pageImageBlob() API helpers. |
| src/api/routes.py | Starts analysis on upload, enriches metas on read/list, adds page image endpoint. |
| src/utils/document_metadata.py | Adds metadata inference/enrichment helpers (company/report type/period end). |
| src/agent/nodes.py | Persists enriched metadata during pipeline finalize. |
| src/storage/task_store.py | Ensures get() sees updates from other SQLite connections via rollback. |
| src/schemas/models.py | Drops invalid evidence entries before validation to avoid runtime failures. |
| src/llm/openai_client.py | Provider-aware structured output fallback behavior + JSON payload parsing hardening. |
| src/llm/base.py | Passes provider into OpenAILLMClient construction. |
| tests/test_routes_web.py | Adds coverage for metadata enrichment, upload auto-analysis, and PDFium image endpoint. |
| tests/test_storage.py | Adds regression test ensuring cross-store updates are visible. |
| tests/test_schemas.py | Adds regression test for dropping evidence missing required page references. |
| tests/test_openai_compatible_client.py | New test ensuring DeepSeek structured output uses chat.completions fallback. |
| scripts/open_ui_after_docker.py | Aligns UI open helper with fixed Docker host port. |
| docker-compose.yml | Locks host ports to project-standard bindings. |
| README.md | Updates Docker URLs/ports and documents fixed host ports. |
| .env.example | Updates DeepSeek base URL and removes Docker host-port override variables. |
| .gitignore | Ignores .env.example (note: file is still tracked). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| > | ||
| <div class="item-title">{{ t.title || t.table_id }}</div> | ||
| <div class="item-meta muted">第 {{ t.page }} 页 · {{ t.cells.length }} 单元格</div> | ||
| <div class="item-meta muted">第 {{ t.page }} 页 · {{ t.n_rows || '—' }}×{{ t.n_cols || '—' }} · {{ t.cells.length }} 单元格</div> |
Comment on lines
64
to
+68
| chain = prompt | self._chat_model | ||
| message = await chain.ainvoke({"system_message": system, "user_message": user}) | ||
| return _message_to_text(message) | ||
| if json_schema and not self._supports_native_structured_output(): | ||
| user = _with_json_instructions(user, json_schema) |
Comment on lines
122
to
+126
|
|
||
| def _load_enriched_meta(doc_id: str) -> DocumentMeta | None: | ||
| meta = store.load_meta(doc_id) | ||
| if meta is None: | ||
| return None |
| OPENAI_MODEL=gpt-4.1-mini | ||
| DEEPSEEK_API_KEY= | ||
| DEEPSEEK_BASE_URL=https://api.deepseek.com/v1 | ||
| DEEPSEEK_BASE_URL=https://api.deepseek.com |
| Use this path when you want the full local system with background worker and infrastructure services. | ||
|
|
||
| ```bash | ||
| copy .env.example .env |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR rebases the document review fixes onto the latest
mainafter PR #10 and bundles the follow-up runtime fixes that were validated locally.What changed
queuedtasks when the frontend follow-up request is missedValidation
python -m pytest tests/test_routes_web.py tests/test_storage.py tests/test_schemas.py tests/test_openai_compatible_client.py -qnpm run typechecknpm run test:unitNotes
origin/main, which already includes PR Add agent intelligence analysis layer #10.