Skip to content

feat(papers): add pubmed_search, fetch_pubmed, fetch_doi to hf_papers tool#108

Open
lucianfialho wants to merge 1 commit intohuggingface:mainfrom
lucianfialho:feat/pubmed-doi-compact-paper-fetch
Open

feat(papers): add pubmed_search, fetch_pubmed, fetch_doi to hf_papers tool#108
lucianfialho wants to merge 1 commit intohuggingface:mainfrom
lucianfialho:feat/pubmed-doi-compact-paper-fetch

Conversation

@lucianfialho
Copy link
Copy Markdown

Summary

Extends hf_papers with three new operations covering biomedical and DOI-based literature that the current tool can't reach:

  • pubmed_search — keyword search over PubMed via NCBI E-utilities (esearch + esummary). Covers clinical, biomedical, and pharmacological literature not indexed on arXiv/HF Hub.
  • fetch_pubmed — fetch abstract + metadata for a PMID via efetch XML. Returns title, authors, journal, year, DOI link.
  • fetch_doi — fetch metadata + abstract for any DOI via Crossref API. Covers bioRxiv, medRxiv, PsyArXiv, and journal articles. Useful when you have a DOI from a citation graph result but no arXiv ID.

Why this gap matters

The existing search operation is HF/arXiv-tuned — great for CS/ML. But research_tool increasingly needs to pull biomedical papers (clinical benchmarks, neuroscience, drug interaction studies). These live on PubMed/DOI space, not arXiv. This PR makes hf_papers the single tool for any literature source.

Implementation

  • No new dependencies — httpx (already present) + stdlib xml.etree.ElementTree
  • Follows the existing _op_* / ToolResult pattern exactly
  • Registered in both _OPERATIONS and HF_PAPERS_TOOL_SPEC (enum + parameter docs)
  • pmid and doi parameters added to schema with removeprefix normalization (accepts pmid:38903003 or 38903003)

Example usage

# Search biomedical literature
hf_papers(operation="pubmed_search", query="psilocybin anxiety clinical trial", limit=5)

# Fetch by PMID
hf_papers(operation="fetch_pubmed", pmid="38903003")

# Fetch a bioRxiv preprint by DOI
hf_papers(operation="fetch_doi", doi="10.1101/2023.12.15.571821")

Closes #93

Test plan

  • pubmed_search returns results for a known biomedical query
  • fetch_pubmed returns abstract for pmid:38903003 (known good)
  • fetch_doi returns metadata for a bioRxiv DOI
  • fetch_doi handles a paywalled journal DOI gracefully (returns Crossref metadata even if full text unavailable)
  • Existing operations unaffected (schema enum extended, not replaced)

Extends hf_papers tool with three new operations covering biomedical
and DOI-based literature beyond arXiv:

- pubmed_search: keyword search via NCBI E-utilities (esearch + esummary)
- fetch_pubmed: fetch abstract for a PMID via efetch XML
- fetch_doi: fetch metadata + abstract for any DOI via Crossref API
  (covers bioRxiv, medRxiv, PsyArXiv, journal articles)

No new dependencies — uses httpx (already a dep) and stdlib xml.etree.
All three operations follow the existing ToolResult pattern and are
registered in _OPERATIONS / HF_PAPERS_TOOL_SPEC.

Closes huggingface#93
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

paper7 as a lightweight paper-fetching tool for the ToolRouter

1 participant