Skip to content

Add APA DOCX manuscript export with reference deduplication#20

Closed
matrixflora wants to merge 1 commit into
HolobiomicsLab:mainfrom
matrixflora:main
Closed

Add APA DOCX manuscript export with reference deduplication#20
matrixflora wants to merge 1 commit into
HolobiomicsLab:mainfrom
matrixflora:main

Conversation

@matrixflora

Copy link
Copy Markdown

Summary

Checklist

  • Tests pass locally: uv run pytest
  • Linting passes: uv run ruff check src/ tests/
  • Type check passes: uv run mypy src/
  • Changelog / ROADMAP updated if this is a user-facing change
  • Docs updated if the public API surface changed (CLI flags, REST endpoints, MCP tools, config keys)
  • New dependencies reviewed for license compatibility (Apache 2.0-compatible only)
  • No unrelated cleanup mixed in (keep diffs focused)
  • CLA signed if this is an external contribution (see CONTRIBUTING.md)

Test plan

Related issues

@lfnothias

Copy link
Copy Markdown
Collaborator

Blocking review — not mergeable as-is, and it conflicts with the "stay generalizable" goal:

  1. Debug prints left in: print("OUTPUT PATH:", output_path) in both _generate_answer and _generate_single_paper_answer.
  2. Always-on side effect: a .docx is written to output/{id}_manuscript.docx on every agentic answer. This should be opt-in (a config flag, default off), not forced on all users.
  3. Core dependency for a niche feature: python-docx is added to base dependencies. It should be an optional extra (e.g. [docx]), imported lazily so the dep is only needed when the feature is enabled.
  4. Likely runtime bug: export_apa_docx/to_apa treat papers as dicts (paper.get(...)), but the orchestrator passes Paper objects — the bare except would swallow the resulting AttributeError and silently write blank references.
  5. No test.

Happy to rework this into an opt-in, optional-extra, tested exporter (matching the pattern used elsewhere). Converting to draft until then.

@lfnothias lfnothias marked this pull request as draft June 19, 2026 17:07
@lfnothias

Copy link
Copy Markdown
Collaborator

Superseded by #24, which reworks this into a safe, generalizable form: opt-in (off by default) via config.rag_modes.agentic.export_apa_docx, python-docx as an optional [docx] extra (lazy import), tolerant of Paper/Author objects (this PR would have silently produced blank refs on real objects), no debug prints, export failures isolated from the answer path, and hermetic tests. Closing in favour of #24.

@lfnothias lfnothias closed this Jun 19, 2026
lfnothias added a commit that referenced this pull request Jun 19, 2026
Generalizable rework of the approach in #20 (closed as draft):
- OFF by default via config.rag_modes.agentic.export_apa_docx (+ _dir).
- python-docx is an optional [docx] extra (uv sync --extra docx), imported
  lazily; absent extra raises a clear ImportError, swallowed off the answer path.
- Exporter tolerates Paper objects, dicts, and Author objects/strings
  (the original treated papers/authors as dicts/strings and silently failed).
- No debug prints; export failures never break the answer; references deduped.
- Hermetic tests: APA formatting, config default-off, disabled-helper no-op,
  and a real .docx write (skipped without the extra).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants