Surface web search/fetch citations across all providers#318
Draft
cpsievert wants to merge 7 commits into
Draft
Conversation
10560f5 to
9a944ca
Compare
cpsievert
commented
Jun 12, 2026
…entToolResponseSearch - Add `Citation` (url, title, cited_text) and `Source` (url, title, fetch_status) types - `ContentText.citations` holds a list of `Citation` objects; merges on `__add__` - `ContentToolResponseSearch.sources` replaces the old `urls` field, adding `fetch_status` - Export `Citation` and `Source` from the public `chatlas.types` namespace
…eam_other_contents `stream_content()` is now the single hook for streaming — it returns a list of `Content` objects emitted at each chunk. The old per-type hooks (`stream_text`, `stream_other_contents`) are removed; providers and accumulators use the unified list contract instead. `Chat` iterates the returned list to dispatch yields.
…pic, Google) Each provider now populates `ContentText.citations` from its web-search results and emits `ContentCitation` items via `stream_content` during streaming: - OpenAI: maps `url_citation` annotations to `Citation`; streams via `annotation.added` - Anthropic: transfers web-search result citations to `ContentText`; streams citations interleaved at `content_block_stop` - Google: surfaces grounding/url-context metadata as `Citation`/`Source`; emits citations at the final chunk via `stream_content`
- Document Citation, Source, and ContentCitation in the API reference - Add sidebar/quarto entries for the new citation types - Regenerate openai/_submit*.py type stubs to pick up the new moderation param - CHANGELOG entry for the citation content model feature
…, Breaking Changes last
…pe with flat url/title Remove the dual citation representation (Citation class + ContentText.citations vs ContentCitation) and unify on ContentCitation as the sole citation type in both streaming and final turns. ContentCitation now carries url and title directly (no wrapper); stream position is the placement signal.
e1ef76b to
172147e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this matters
When a model answers using web search or fetch tools, the most useful artifact is which sources back the answer. Previously chatlas dropped that information — OpenAI's citation annotations were discarded, Anthropic's citation deltas never reached normalized content, and Google's grounding metadata was ignored entirely.
This PR gives chatlas a normalized citation model so grounded answers carry their sources through to turns uniformly — both progressively during streaming and on the final turn — enabling downstream UIs (e.g. shinychat) to render footnote markers and source lists.
What's now possible
ContentCitation(url, title)appears in the turn'scontentslist after the text it grounds, in stream order. Its position relative to surroundingContentTextitems is the placement signal — no offsets or span-matching needed.content="all"),ContentCitationobjects arrive interleaved with text (OpenAI/Anthropic) or at stream end (Google), so UIs can render citations progressively.ContentCitationitems sit in the samecontentslist in the same order — streaming and replay have one shape, not two.Sourceobjects (url,title,domain), and web fetch results carry astatusfield.ContentCitationandSourceare exported fromchatlas.types.Breaking change
ContentToolResponseSearch.urls(list[str]) is replaced by.sources(list[Source]). Migratex.urls→[s.url for s in x.sources]. (These content types are recent, so blast radius is small.)Notes for reviewers
_provider_*.py'sstream_content()and_as_turn(). All three providers filterContentCitationout of turn serialization (it's client-side metadata, not sent back to APIs).stream_content()now returnslist[Content]instead ofOptional[Content], allowing a single event to produce multiple content items (e.g. text + citations at block stop).Test plan
uv run pyright— 0 errorsContentCitationitems in both streaming and final turns (all replaying VCR cassettes offline)make checkin CI