feat(papers): add pubmed_search, fetch_pubmed, fetch_doi to hf_papers tool#108
Open
lucianfialho wants to merge 1 commit intohuggingface:mainfrom
Open
feat(papers): add pubmed_search, fetch_pubmed, fetch_doi to hf_papers tool#108lucianfialho wants to merge 1 commit intohuggingface:mainfrom
lucianfialho wants to merge 1 commit intohuggingface:mainfrom
Conversation
Extends hf_papers tool with three new operations covering biomedical and DOI-based literature beyond arXiv: - pubmed_search: keyword search via NCBI E-utilities (esearch + esummary) - fetch_pubmed: fetch abstract for a PMID via efetch XML - fetch_doi: fetch metadata + abstract for any DOI via Crossref API (covers bioRxiv, medRxiv, PsyArXiv, journal articles) No new dependencies — uses httpx (already a dep) and stdlib xml.etree. All three operations follow the existing ToolResult pattern and are registered in _OPERATIONS / HF_PAPERS_TOOL_SPEC. Closes huggingface#93
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends
hf_paperswith three new operations covering biomedical and DOI-based literature that the current tool can't reach:pubmed_search— keyword search over PubMed via NCBI E-utilities (esearch+esummary). Covers clinical, biomedical, and pharmacological literature not indexed on arXiv/HF Hub.fetch_pubmed— fetch abstract + metadata for a PMID viaefetchXML. Returns title, authors, journal, year, DOI link.fetch_doi— fetch metadata + abstract for any DOI via Crossref API. Covers bioRxiv, medRxiv, PsyArXiv, and journal articles. Useful when you have a DOI from a citation graph result but no arXiv ID.Why this gap matters
The existing
searchoperation is HF/arXiv-tuned — great for CS/ML. Butresearch_toolincreasingly needs to pull biomedical papers (clinical benchmarks, neuroscience, drug interaction studies). These live on PubMed/DOI space, not arXiv. This PR makeshf_papersthe single tool for any literature source.Implementation
httpx(already present) + stdlibxml.etree.ElementTree_op_*/ToolResultpattern exactly_OPERATIONSandHF_PAPERS_TOOL_SPEC(enum + parameter docs)pmidanddoiparameters added to schema withremoveprefixnormalization (acceptspmid:38903003or38903003)Example usage
Closes #93
Test plan
pubmed_searchreturns results for a known biomedical queryfetch_pubmedreturns abstract for pmid:38903003 (known good)fetch_doireturns metadata for a bioRxiv DOIfetch_doihandles a paywalled journal DOI gracefully (returns Crossref metadata even if full text unavailable)